Closed Bug 1620679 Opened 9 months ago Closed 6 months ago

crashtest layout/printing/crashtests/793844.html consistently asserts with Fission (Overwriting an existing document channel, Wrong Document Channel)

Categories

(Core :: DOM: Content Processes, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Fission Milestone M6a
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fixed

People

(Reporter: cpeterson, Assigned: mattwoodrow)

References

(Blocks 1 open bug, Regressed 2 open bugs)

Details

Attachments

(2 files)

layout/printing/crashtests/793844.html
pretty consistently either asserts or crashes
application crashed [@ mozilla::ProfilerParent::CreateForProcess(int)]

[Child 2666, Main Thread] ###!!! ASSERTION: Wrong Document Channel: 'request == mDocumentRequest', file /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 1500
[Child 2666, Main Thread] ###!!! ASSERTION: Overwriting an existing document channel!: '(loadFlags & nsIChannel::LOAD_REPLACE) || !(mDocumentRequest.get())', file /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 428

The priority flag is not set for this bug.
:neha, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nkochar)
Flags: needinfo?(nkochar)
Priority: -- → P2

It no longer crashes, just assserts.

Summary: crashtest layout/printing/crashtests/793844.html consistently asserts or crashes with Fission [@ mozilla::ProfilerParent::CreateForProcess( → crashtest layout/printing/crashtests/793844.html consistently asserts with Fission (Overwriting an existing document channel, Wrong Document Channel)

This test has a cross-origin iframe element, and in the element's onload handler (not the load handler for the inner Document), it attempts to reload the test.

We successfully load once, waiting for the OOP iframe to load, and then fire the load event to the iframe element (triggering a reload), and the load event to the Document (triggering the reftest harness to start advancing to the next test).

The reftest harness then starts a load of a data URI for the between-test state (from RecvClear). This calls nsDocShell::InternalLoad, which calls Stop() to stop any currently in-progress loads before starting the new load.

In this case, the reload is usually in-progress, and we've usually finished loading the main document, but we're waiting on the OOP iframe to load (which takes longer with fission, since it involves a process switch).

Stopping this load cancels the iframe load, and then since the outer document load had already completed successfully, we fire the load event. I think this is intentional, since we generally don't want subresource load failures to prevent the document load event from firing.

The reftest harness handles this load event, and tries to start the data uri load a second time (here - https://searchfox.org/mozilla-central/rev/b9a814e53b3b6f5cb665a78f4777868e7a16bfcd/layout/tools/reftest/reftest-content.js#1048). At this point we're recursing into nsDocShell::InternalLoad a second time (the original InternalLoad, Stop, and load() are all on the stack).

This creates a document channel for the inner data: uri load, we unwind the stack to the outer InternalLoad call and create a document channel for the outer data: uri, and now we have two document loads being attempted simultaneously (and these assertions).

Crashes on docshell/base/crashtests/1331295.html appear to at least partially be the same issue (though the recursive load call happens within a load event handler on the test, not one on the reftest harness).

I think the best solution here is not fire a load event if we Stop() a load, even if it happens while only subresources are still pending. This appears to match what chrome does, and what the html5 spec says (no mention of firing load events) - https://www.w3.org/TR/html52/browsers.html#aborting-a-document-load

Unfortunately doing so means that this test now never completes, since it reloads itself before the load event, and the reftest harness waits forever without a load event.

I don't think we fire error events in this case (maybe we should?), or maybe we should try to fix the test so that it actually loads.

Assignee: nobody → matt.woodrow

This matches what the spec says, and what blink does.

The previous patch stops us from firing the load event if we abort a load. We have a few crashtests that abort loads (either by directly calling stop(), or by starting a new navigation before load completes).

This switches the reftest harness to use web progress to determine when we've finished loading a test.

Depends on D73994

Attachment #9146047 - Attachment description: Bug 1620679 - Don't fire load event from within Stop(). r?nika → Bug 1620679 - Don't fire load event from within Stop(). r?smaug
Blocks: 1634598

Is this ready to land Matt?

(In reply to Julien Cristau [:jcristau] from comment #6)

Is this ready to land Matt?

Fission Milestone: M6 → M6a
Flags: needinfo?(matt.woodrow)
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/60df74b03f34
Don't fire load event from within Stop(). r=smaug
https://hg.mozilla.org/integration/autoland/rev/2957142243c5
Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23665 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR was closed without merging
Blocks: 1639080

[Tracking Requested - why for this release]:

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4ca2912d7421
Don't fire load event from within Stop(). r=smaug
https://hg.mozilla.org/integration/autoland/rev/8190c67514cc
Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR was closed without merging

Right after the call to set the resolution to 2 on the presshell there is a call to set it to 1.

Oh, and using Apz instead of MainThreadRestore at

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/dom/base/nsDOMWindowUtils.cpp#574

fixes the problem. So the apzc is overriding the test probably.

With the patch

GeckoMVMContext::SetResolutionAndScaleTo call SetResolutionAndScaleTo on the presshell with resolution = 1
then reftest harness calls SetResolutionAndScaleTo on the presshell is resolution 2
GeckoMVMContext::SetResolutionAndScaleTo call SetResolutionAndScaleTo on the presshell with resolution = 1

without the patch

then reftest harness calls SetResolutionAndScaleTo on the presshell is resolution 2

Looks like MobileViewportManager chooses a resolution of one on load, then reftest sets resolution to 2, then we tell MobileViewportManager that resolution changed, it recomputes and says resolution should be 1 again. Whereas before these patches the reftest resolution setting happened before MobileViewportManager did anything and that short circuits all that I guess.

Before the patch in this bug we call PresShell::SetResolutionAndScaleTo from the reftest harness with resolution 2, it calls MobileViewportManager::ResolutionUpdated and it calls SetRestoreResolution here

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/layout/base/MobileViewportManager.cpp#117

because mPainted is false.

After the patches in this bug the PresShell::SetResolutionAndScaleTo call from the reftest harness with resolution 2 happens slightly later, so that the load event trigger calls SetInitialViewport, and sets mPainted to true. Then we get the PresShell::SetResolutionAndScaleTo call from the reftest harness with resolution 2 but we do not call SetRestoreResolution in MobileViewportManager::ResolutionUpdated. Then the root scroll frame gets reflowed and it's reflow finished callback (ScrollFrameHelper::ReflowFinished) calls MobileViewportManager::ShrinkToDisplaySizeIfNeeded which calls MobileViewportManager::UpdateResolution. UpdateResolution proceeds differently because we now don't have a restore resolution set and so we hit this if (instead of the else)

https://searchfox.org/mozilla-central/rev/9aa7bebfd169bc2ead00ef596498a406e56bbb85/layout/base/MobileViewportManager.cpp#404

and that sets our zoom to the intrinsic scale which is 1, wiping out the 2 resolution. (Whereas before the patch the zoom doesn't change, we don't get a newZoom and we don't call SetResolutionAndScaleTo.)

So in summary if the PresShell::SetResolutionAndScaleTo call from the reftest harness happens after load then that resolution doesn't get set as the restore resolution which changes how the UpdateResolution call happens.

Kats, what should happen here? A special test only resolution change origin that is allowed to take precedence?

Flags: needinfo?(kats)

After poking around in the history a bit, yeah, it seems like it's been working mostly accidentally. I would introduce a new ResolutionChangeOrigin for tests, use that in DWU::SetResolutionAndScaleTo, and have it take priority.

Another thing that might be worth doing is trying to make tests use meta tags for controlling initial zoom level instead of reftest-resolution attributes. I'm not sure if that always works (e.g. might need to set prefs for desktop) but it would probably be more representative of real-world pages.

Flags: needinfo?(kats)
Depends on: 1642088

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #23)

Another thing that might be worth doing is trying to make tests use meta tags for controlling initial zoom level instead of reftest-resolution attributes. I'm not sure if that always works (e.g. might need to set prefs for desktop) but it would probably be more representative of real-world pages.

Actually, we introduced reftest-resolution specifically so that we can write tests that exercise the main-thread zooming codepaths without requiring mobile viewport sizing. I wrote an explanation of this in bug 1554790 which introduced the attribute:

However, processing a meta viewport tag has side effects beyond setting a resolution: it triggers the mobile viewport sizing logic added in bug 1423013 and relatives.

For desktop zooming, we'd like to be able to test painting with a resolution without triggering the mobile viewport sizing logic. (For at least two reasons: (1) we want to be testing configurations that are as close as possible to what happens in production; and (2) the mobile viewport sizing logic has interactions with layout scrollbars and I'd rather not support the combination if I don't have to.)

Just commenting here so it's easier to find in the future: bug 1642088 is for the issue with resolution.

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9cec607a0ea7
Don't fire load event from within Stop(). r=smaug
https://hg.mozilla.org/integration/autoland/rev/2284f261a168
Use web progress listener for detecting load end in reftest-content.js. r=kmag,tnikkel
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Flags: in-testsuite+
Upstream PR merged by moz-wptsync-bot
Flags: needinfo?(matt.woodrow)
Regressions: 1655677
You need to log in before you can comment on or make changes to this bug.