1620679 - crashtest layout/printing/crashtests/793844.html consistently asserts with Fission (Overwriting an existing document channel, Wrong Document Channel)

Reporter

Description

•

6 years ago

layout/printing/crashtests/793844.html
pretty consistently either asserts or crashes
application crashed [@ mozilla::ProfilerParent::CreateForProcess(int)]

[Child 2666, Main Thread] ###!!! ASSERTION: Wrong Document Channel: 'request == mDocumentRequest', file /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 1500
[Child 2666, Main Thread] ###!!! ASSERTION: Overwriting an existing document channel!: '(loadFlags & nsIChannel::LOAD_REPLACE) || !(mDocumentRequest.get())', file /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 428

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

6 years ago

The priority flag is not set for this bug.
:neha, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nkochar)

Andrew McCreight [:mccr8]

Updated

•

6 years ago

Flags: needinfo?(nkochar)

Priority: -- → P2

Timothy Nikkel (:tnikkel)

Comment 2

•

5 years ago

It no longer crashes, just assserts.

Summary: crashtest layout/printing/crashtests/793844.html consistently asserts or crashes with Fission [@ mozilla::ProfilerParent::CreateForProcess( → crashtest layout/printing/crashtests/793844.html consistently asserts with Fission (Overwriting an existing document channel, Wrong Document Channel)

Matt Woodrow (:mattwoodrow)

Assignee

Comment 3

•

5 years ago

This test has a cross-origin iframe element, and in the element's onload handler (not the load handler for the inner Document), it attempts to reload the test.

We successfully load once, waiting for the OOP iframe to load, and then fire the load event to the iframe element (triggering a reload), and the load event to the Document (triggering the reftest harness to start advancing to the next test).

The reftest harness then starts a load of a data URI for the between-test state (from RecvClear). This calls nsDocShell::InternalLoad, which calls Stop() to stop any currently in-progress loads before starting the new load.

In this case, the reload is usually in-progress, and we've usually finished loading the main document, but we're waiting on the OOP iframe to load (which takes longer with fission, since it involves a process switch).

Stopping this load cancels the iframe load, and then since the outer document load had already completed successfully, we fire the load event. I think this is intentional, since we generally don't want subresource load failures to prevent the document load event from firing.

The reftest harness handles this load event, and tries to start the data uri load a second time (here - https://searchfox.org/mozilla-central/rev/b9a814e53b3b6f5cb665a78f4777868e7a16bfcd/layout/tools/reftest/reftest-content.js#1048). At this point we're recursing into nsDocShell::InternalLoad a second time (the original InternalLoad, Stop, and load() are all on the stack).

This creates a document channel for the inner data: uri load, we unwind the stack to the outer InternalLoad call and create a document channel for the outer data: uri, and now we have two document loads being attempted simultaneously (and these assertions).

Crashes on docshell/base/crashtests/1331295.html appear to at least partially be the same issue (though the recursive load call happens within a load event handler on the test, not one on the reftest harness).

I think the best solution here is not fire a load event if we Stop() a load, even if it happens while only subresources are still pending. This appears to match what chrome does, and what the html5 spec says (no mention of firing load events) - https://www.w3.org/TR/html52/browsers.html#aborting-a-document-load

Unfortunately doing so means that this test now never completes, since it reloads itself before the load event, and the reftest harness waits forever without a load event.

I don't think we fire error events in this case (maybe we should?), or maybe we should try to fix the test so that it actually loads.

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

5 years ago

Assignee: nobody → matt.woodrow

Matt Woodrow (:mattwoodrow)

Assignee

Comment 4

•

5 years ago

Attached file Bug 1620679 - Don't fire load event from within Stop(). r?smaug — Details

This matches what the spec says, and what blink does.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 5

•

5 years ago

Attached file Bug 1620679 - Use web progress listener for detecting load end in reftest-content.js. r?kmag!,tnikkel! — Details

The previous patch stops us from firing the load event if we abort a load. We have a few crashtests that abort loads (either by directly calling stop(), or by starting a new navigation before load completes).

This switches the reftest harness to use web progress to determine when we've finished loading a test.

Depends on D73994

Phabricator Automation

Updated

•

5 years ago

Attachment #9146047 - Attachment description: Bug 1620679 - Don't fire load event from within Stop(). r?nika → Bug 1620679 - Don't fire load event from within Stop(). r?smaug

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

5 years ago

Blocks: 1634598

Pascal Chevrel:pascalc

Updated

•

5 years ago

status-firefox77: --- → affected

tracking-firefox77: --- → +

Julien Cristau [:jcristau]

Comment 6

•

5 years ago

Is this ready to land Matt?

Chris Peterson [:cpeterson]

Reporter

Comment 7

•

5 years ago

(In reply to Julien Cristau [:jcristau] from comment #6)

Is this ready to land Matt?

Fission Milestone: M6 → M6a

status-firefox76: --- → ?

status-firefox78: --- → affected

status-firefox-esr68: --- → unaffected

Flags: needinfo?(matt.woodrow)

Pulsebot

Comment 8

•

5 years ago

Pushed by mwoodrow@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/60df74b03f34 Don't fire load event from within Stop(). r=smaug https://hg.mozilla.org/integration/autoland/rev/2957142243c5 Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 9

•

5 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23665 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 10

•

5 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 11

•

5 years ago

Upstream PR was closed without merging

Alexandru Michis [:malexandru]

Comment 12

•

5 years ago

Backed out 2 changesets for causing failures in nsDocShell.cpp

Backout link: https://hg.mozilla.org/integration/autoland/rev/7e945d67d9816a5939f6564dcf338f82040741b8

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=bKBdDhNcRBWLhSMLaZB7GA-0&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cretry%2Cusercancel%2Crunnable&searchStr=android%2C7.0%2Cx86-64%2Cdebug%2Creftests%2Ctest-android-em-7.0-x86_64%2Fdebug-geckoview-crashtest-e10s%2Cr%28c%29&fromchange=59cdba115447e5687ee020224923fbefd6ed19e4&tochange=7e945d67d9816a5939f6564dcf338f82040741b8

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302715236&repo=autoland&lineNumber=16707

There is also a bc failure that might be a regression as well:
https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=62ad26fbfaf8cd6ee331b5d79d6933fa10723bf2&searchStr=Windows%2C10%2Cx64%2CWebRender%2Cdebug%2CMochitests%2Ctest-windows10-64-qr%2Fdebug-mochitest-browser-chrome-e10s-1%2CM%28bc1%29&tochange=7e945d67d9816a5939f6564dcf338f82040741b8&selectedTaskRun=XwMQf615R9iCQtH8OnpxEA-0
Though it's still running, will update after it finishes.

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

5 years ago

Blocks: 1639080

Pascal Chevrel:pascalc

Comment 13

•

5 years ago

[Tracking Requested - why for this release]:

status-firefox77: affected → wontfix

tracking-firefox77: + → ---

Pulsebot

Comment 14

•

5 years ago

Pushed by mwoodrow@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4ca2912d7421 Don't fire load event from within Stop(). r=smaug https://hg.mozilla.org/integration/autoland/rev/8190c67514cc Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 15

•

5 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Atila Butkovits

Comment 16

•

5 years ago

Backed out for Reftest perma failures.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=b1hNExYLTrKue-Wru_UWvg-0&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&revision=8190c67514ccb96f859287e833b237e3af10cdc6

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=303722989&repo=autoland&lineNumber=2218

Backout: https://hg.mozilla.org/integration/autoland/rev/4da52a3b8dfd3a53276c8a5d37ebdbbf0d4b37f6

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 17

•

5 years ago

Upstream PR was closed without merging

Timothy Nikkel (:tnikkel)

Comment 18

•

5 years ago

Right after the call to set the resolution to 2 on the presshell there is a call to set it to 1.

Timothy Nikkel (:tnikkel)

Comment 19

•

5 years ago

Oh, and using Apz instead of MainThreadRestore at

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/dom/base/nsDOMWindowUtils.cpp#574

fixes the problem. So the apzc is overriding the test probably.

Timothy Nikkel (:tnikkel)

Comment 20

•

5 years ago

With the patch

GeckoMVMContext::SetResolutionAndScaleTo call SetResolutionAndScaleTo on the presshell with resolution = 1
then reftest harness calls SetResolutionAndScaleTo on the presshell is resolution 2
GeckoMVMContext::SetResolutionAndScaleTo call SetResolutionAndScaleTo on the presshell with resolution = 1

without the patch

then reftest harness calls SetResolutionAndScaleTo on the presshell is resolution 2

Timothy Nikkel (:tnikkel)

Comment 21

•

5 years ago

Looks like MobileViewportManager chooses a resolution of one on load, then reftest sets resolution to 2, then we tell MobileViewportManager that resolution changed, it recomputes and says resolution should be 1 again. Whereas before these patches the reftest resolution setting happened before MobileViewportManager did anything and that short circuits all that I guess.

Timothy Nikkel (:tnikkel)

Comment 22

•

5 years ago

Before the patch in this bug we call PresShell::SetResolutionAndScaleTo from the reftest harness with resolution 2, it calls MobileViewportManager::ResolutionUpdated and it calls SetRestoreResolution here

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/layout/base/MobileViewportManager.cpp#117

because mPainted is false.

After the patches in this bug the PresShell::SetResolutionAndScaleTo call from the reftest harness with resolution 2 happens slightly later, so that the load event trigger calls SetInitialViewport, and sets mPainted to true. Then we get the PresShell::SetResolutionAndScaleTo call from the reftest harness with resolution 2 but we do not call SetRestoreResolution in MobileViewportManager::ResolutionUpdated. Then the root scroll frame gets reflowed and it's reflow finished callback (ScrollFrameHelper::ReflowFinished) calls MobileViewportManager::ShrinkToDisplaySizeIfNeeded which calls MobileViewportManager::UpdateResolution. UpdateResolution proceeds differently because we now don't have a restore resolution set and so we hit this if (instead of the else)

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/layout/base/MobileViewportManager.cpp#404

and that sets our zoom to the intrinsic scale which is 1, wiping out the 2 resolution. (Whereas before the patch the zoom doesn't change, we don't get a newZoom and we don't call SetResolutionAndScaleTo.)

So in summary if the PresShell::SetResolutionAndScaleTo call from the reftest harness happens after load then that resolution doesn't get set as the restore resolution which changes how the UpdateResolution call happens.

Kats, what should happen here? A special test only resolution change origin that is allowed to take precedence?

Flags: needinfo?(kats)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 23

•

5 years ago

After poking around in the history a bit, yeah, it seems like it's been working mostly accidentally. I would introduce a new ResolutionChangeOrigin for tests, use that in DWU::SetResolutionAndScaleTo, and have it take priority.

Another thing that might be worth doing is trying to make tests use meta tags for controlling initial zoom level instead of reftest-resolution attributes. I'm not sure if that always works (e.g. might need to set prefs for desktop) but it would probably be more representative of real-world pages.

Flags: needinfo?(kats)

Timothy Nikkel (:tnikkel)

Updated

•

5 years ago

Depends on: 1642088

Botond Ballo [:botond]

Comment 24

•

5 years ago

•

Edited

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #23)

Another thing that might be worth doing is trying to make tests use meta tags for controlling initial zoom level instead of reftest-resolution attributes. I'm not sure if that always works (e.g. might need to set prefs for desktop) but it would probably be more representative of real-world pages.

Actually, we introduced reftest-resolution specifically so that we can write tests that exercise the main-thread zooming codepaths without requiring mobile viewport sizing. I wrote an explanation of this in bug 1554790 which introduced the attribute:

However, processing a meta viewport tag has side effects beyond setting a resolution: it triggers the mobile viewport sizing logic added in bug 1423013 and relatives.

For desktop zooming, we'd like to be able to test painting with a resolution without triggering the mobile viewport sizing logic. (For at least two reasons: (1) we want to be testing configurations that are as close as possible to what happens in production; and (2) the mobile viewport sizing logic has interactions with layout scrollbars and I'd rather not support the combination if I don't have to.)

Timothy Nikkel (:tnikkel)

Comment 25

•

5 years ago

Just commenting here so it's easier to find in the future: bug 1642088 is for the issue with resolution.

Pulsebot

Comment 26

•

5 years ago

Pushed by mwoodrow@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9cec607a0ea7 Don't fire load event from within Stop(). r=smaug https://hg.mozilla.org/integration/autoland/rev/2284f261a168 Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 27

•

5 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Razvan Maries

Comment 28

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9cec607a0ea7
https://hg.mozilla.org/mozilla-central/rev/2284f261a168

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox76: ? → wontfix

Flags: in-testsuite+

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 29

•

5 years ago

Upstream PR merged by moz-wptsync-bot

Florin Strugariu [:Bebe]

Updated

•

5 years ago

Regressions: 1642998

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

5 years ago

Flags: needinfo?(matt.woodrow)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Updated

•

5 years ago

Regressions: 1643204

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox78: affected → wontfix

Timothy Nikkel (:tnikkel)

Updated

•

5 years ago

Regressions: 1655677

Bug 1620679 - Don't fire load event from within Stop(). r?smaug 5 years ago Matt Woodrow (:mattwoodrow) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1620679 - Use web progress listener for detecting load end in reftest-content.js. r?kmag!,tnikkel! 5 years ago Matt Woodrow (:mattwoodrow) 47 bytes, text/x-phabricator-request		Details \| Review