Closed Bug 1718761 Opened 4 years ago Closed 4 years ago

[Fission] Autoscroll doesn't work with (cross-origin?) iframes

Categories

(Core :: Panning and Zooming, defect)

Firefox 91
Unspecified
All
defect

Tracking

()

RESOLVED FIXED
92 Branch
Fission Milestone MVP
Tracking Status
firefox-esr78 --- disabled
firefox89 --- disabled
firefox90 --- disabled
firefox91 --- disabled
firefox92 --- fixed

People

(Reporter: mmis1000, Assigned: u608768)

References

(Blocks 2 open bugs)

Details

(Whiteboard: fission-soft-blocker)

Attachments

(4 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

  1. Open css snap example at https://developer.mozilla.org/en-US/docs/Web/CSS/scroll-snap-type
  2. middle click on one of the example, make the cursor show the scroll indicator
  3. drag the mouse to make it scroll

Actual results:

It didn't scroll at all.

Expected results:

It scrolls normally (in proximity example)
It scrolls to next anchor (in mandatory example)

Juts like what chrome did

There are two problems.
#1 Middle-click scrolling works after the second click. (see Bug 1719031)
This seems to be implementation bug of Firefox. Because the #1 problem can be reproduced in Firefox 68.

#2 Middle-click scrolling does not work at all, if Fission Nightly experiments is enabled.

I set tracking flags for the #2 problem.

Blocks: fission
Status: UNCONFIRMED → NEW
Has Regression Range: --- → no
Has STR: --- → yes
Component: Untriaged → Layout
Ever confirmed: true
Keywords: regression
Product: Firefox → Core
Summary: CSS snap breaks scrolling via middle click and drag → [Fission] CSS snap breaks scrolling via middle click and drag

It works fine both on Linux and Mac. I don't think scroll-snap-type is related.

Blocks: 1597765
Component: Layout → General
OS: Unspecified → Windows 10
Product: Core → Firefox
Summary: [Fission] CSS snap breaks scrolling via middle click and drag → [Fission] Autoscroll doesn't work on Windows
Fission Milestone: --- → ?

I was able to reproduce on my end. I'll use mozregression to provide the regression range ASAP.
Best,
Clara

:mmis1000, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mmis1000)

It seems that breaks on all version as long as web fission is enabled.
So it is regressed by "web fission".

Flags: needinfo?(mmis1000)

If you add fission.autostart to the moz-regression config ,then no build of firefox world scroll correctly on css snap enabled container

Config used

URL: https://developer.mozilla.org/en-US/docs/Web/CSS/scroll-snap-type

Start date: 2019/07/03
End date: 2021/07/03

Prefs:

fission.autostart true
gfx.webrender.all true

according to https://wiki.mozilla.org/Project_Fission

Consider it breaks even on scroll-snap-type: none;
probably it breaks on all container?

But I can auto scroll on youtube side bar just fine, that really don't make sense.

Probably what breaks it actually is iframe?
It turns out MDN wraps every examples in an iframe.

See Also: → 1719031

https://output.jsbin.com/tojemet

This issue can be reproduced with a iframe to //example.com/ under any version of firefox with web fission enabled

Keywords: regression

(CCing masayuki, since he might recall a thing or two about autoscroll.)

I can't reproduce this. If I enable middle clicking on Windows, autoscroll works in Fission. Or is this bug only about css snap or about cross-origin iframes? I think the latter.

Summary: [Fission] Autoscroll doesn't work on Windows → [Fission] Autoscroll doesn't work on Windows with (cross-origin?) iframes
Fission Milestone: ? → M8
Summary: [Fission] Autoscroll doesn't work on Windows with (cross-origin?) iframes → [Fission] Autoscroll doesn't work with (cross-origin?) iframes

According to the debugger, JSActor::SendQuery in AutoScrollChild.startScroll returned false and null. But I see the autoscroller popup. So, I guess that AutoScrollParent does not return the result to proper child process.

Gijs: WDYT?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #15)

According to the debugger, JSActor::SendQuery in AutoScrollChild.startScroll returned false and null

I'm confused - which part returns null? None of the return statement in this codepath appear to ever return null.

. But I see the autoscroller popup. So, I guess that AutoScrollParent does not return the result to proper child process.

Gijs: WDYT?

I don't really have thoughts based on this information? Is there a specific question you were hoping I was able to answer? My understanding is that the sendQuery implementation lives in DOM/IPC land, and I would be... surprised... if messages could be returned to the wrong process / frame / actor, as I would have expected this to have been a problem before now, and discovered and fixed. But perhaps there's some kind of edge case...

Are there exceptions in the browser console when this problem occurs? Can't you also stop the browser debugger in the parent process side, ie in AutoscrollParent, and find out what is happening there?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(masayuki)

Has this ever worked with Fission? 2019-12-29 is broken, yet it should have bug 1597765.
Testing https://output.jsbin.com/tojemet from comment 12.

I think there's something weird going on with preallocated processes here. Autoscrolling the frame works fine in these conditions:

In the broken case, when we attempt to find a target apzc, we appear to have two identical guids: one for the root and one for the subframe. The root is ahead of the subframe in the node list, so we return that, and then autoscroll the root instead of the subframe (adjust the window to be super short to see this). (edit: after debugging some more, I realized this is wrong: we grab the toplevel LayersID from the RemoteLayerTreeOwner, and use that to find the target apzc, which is a problem, but don't actually have duplicate ScrollableLayerGuids).

Now in the working case from comment #18, we just don't find an apzc, so we're not actually using apz to do the scroll, which feels like a separate problem? Either way, in both cases it sounds like there's some bugs with how we're finding the scrollable layer for out of process frames.

That actually matches what I was suspecting during skimming the front-end code. In Fission, to identify a scrollable frame we need a pair of a layersId (https://searchfox.org/mozilla-central/rev/77f0b36028b2368e342c982ea47609040b399d89/dom/interfaces/base/nsIDOMWindowUtils.idl#2215) and a scrollId, but the frond-end code doesn't use layersId at all.

OS: Windows 10 → All

(In reply to Hiroyuki Ikezoe (:hiro) from comment #20)

In Fission, to identify a scrollable frame we need a pair of a layersId (https://searchfox.org/mozilla-central/rev/77f0b36028b2368e342c982ea47609040b399d89/dom/interfaces/base/nsIDOMWindowUtils.idl#2215) and a scrollId, but the frond-end code doesn't use layersId at all.

Gijs, this sounds like a front-end Fission bug (front-end needs to track APZ's layersId?). How do I flag this bug for the Front-end team to triage and prioritize?

Do you think this bug should be fixed for Fission MVP?

Fission Milestone: M8 → MVP
Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Chris Peterson [:cpeterson] from comment #21)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #20)

In Fission, to identify a scrollable frame we need a pair of a layersId (https://searchfox.org/mozilla-central/rev/77f0b36028b2368e342c982ea47609040b399d89/dom/interfaces/base/nsIDOMWindowUtils.idl#2215) and a scrollId, but the frond-end code doesn't use layersId at all.

Gijs, this sounds like a front-end Fission bug (front-end needs to track APZ's layersId?).

I'm not an APZ expert, but (so?) I don't think this would be fixed any faster by front-end people... It looks like the API for autoscroll's layout support ( https://searchfox.org/mozilla-central/rev/a14ecd829fdb1e9780b7c100016c6a3d951cf7da/dom/interfaces/base/nsIRemoteTab.idl#94 ) doesn't take a layers ID, but takes a presshell id. Should that be replaced with the layers ID, or is this an addition? How should the implementation of that method change? All those are probably still questions for the dom/layout/apz folks, and if these APIs need to be made fission-aware, I'm not sure that's a job for the front-end folks. Passing the relevant ID around from the JS in the child to the JS in the parent on the "front-end" side (either in addition or replacing the presshell ID) is probably the trivial part of such a change. In fact, it also seems possible to me that the presshell ID should already be mappable to a layers ID on the apz implementation side? But then, that's because I don't know what a layersId really is...

How do I flag this bug for the Front-end team to triage and prioritize?

I'm sorry but we don't have a good way of doing this - there is no one "front-end team" and autoscroll isn't actively owned by anyone.

Do you think this bug should be fixed for Fission MVP?

I don't know what the criteria are for Fission MVP, or how to find out. I also don't know how frequently autoscroll is used, and of that usage, how frequently it is used in iframes. I checked, and AFAICT we don't have any telemetry that would help us judge this, either. I also don't know what the severity of other fission MVP or other milestone bugs is and how this should be prioritized in a relative fashion. :-(

Bouncing the needinfo to let you make a call on priority.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(cpeterson)

The layersId basically identifies the child process; it's not implied by the presShellId, it needs to be provided separately.

I had a brief look, and it looks like the current flow for starting an autoscroll is:

  1. AutoScrollChild
  2. (cross child process --> parent process boundary)
  3. AutoScrollParent
  4. browser-custom-element.js
  5. nsIRemoteTab::StartApzAutoscroll (implemented in BrowserHost)
  6. BrowserParent::StartApzAutoscroll
  7. nsIWidget::StartAsyncAutoscroll
  8. (additional steps to get to APZ)

Currently, the layersId is picked up at step 6, here. Note that this is in the parent process.

I think that may have been correct before Fission, but with Fission, I think the information about which child process in the affected tab started the autoscroll is lost by that point.

So, I think the solution here is to pick up the layersId at step 1, using nsIDOMWindowUtils.getLayersId() and propagate it through to step 6.

And I agree with Gijs that based on the investigation so far, this issue is APZ-ish :)

My 2c on prioritization is that we should try to fix this for MVP. If the incorrect layersId is the only issue, the fix should be fairly straightforward.

Component: General → Panning and Zooming
Product: Firefox → Core
Flags: needinfo?(masayuki)

Assigning this bug to Kashav because he says he has a patch.

Assignee: nobody → kmadan
Flags: needinfo?(cpeterson)

And propagate it through to BrowserParent::StartApzAutoscroll.

These don't have a PBrowser actor, so we have to fetch the layersId and start
the autoscroll directly on the widget.

Depends on D120766

Attachment #9232911 - Attachment description: Bug 1718761 - Fetch layersId in the process that initiated the autoscroll, r?botond,Gijs → Bug 1718761 - Move {Start,Stop}ApzAutoscroll to CanonicalBrowsingContext, r?botond,Gijs,hiro
Attachment #9232912 - Attachment is obsolete: true

This bug is a soft blocker for Fission MVP. We'd like to fix it before our Release channel rollout, but we won't delay the rollout waiting for it.

Whiteboard: fission-soft-blocker
Blocks: 1722732
Pushed by kmadan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4593c3d66951 Move {Start,Stop}ApzAutoscroll to CanonicalBrowsingContext, r=Gijs,hiro
Flags: needinfo?(kmadan)
Attachment #9232911 - Attachment description: Bug 1718761 - Move {Start,Stop}ApzAutoscroll to CanonicalBrowsingContext, r?botond,Gijs,hiro → Bug 1718761 - Move {Start,Stop}ApzAutoscroll to CanonicalBrowsingContext, r?Gijs,hiro
Pushed by kmadan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e02e9a1e0b15 Move {Start,Stop}ApzAutoscroll to CanonicalBrowsingContext, r=Gijs,hiro
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch
Flags: needinfo?(kmadan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: