Closed Bug 1266833 Opened 8 years ago Closed 8 years ago

[e10s] Setting overflow:hidden; and height: 100%; on html/body element causes issues if scrolled down the page.

Categories

(Core :: Panning and Zooming, defect)

47 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla49
Tracking Status
e10s m9+ ---
firefox46 --- unaffected
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- verified

People

(Reporter: agibson, Assigned: kats)

References

()

Details

(Keywords: correctness, Whiteboard: [gfx-noted])

Attachments

(1 file, 1 obsolete file)

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:47.0) Gecko/20100101 Firefox/47.0"

STR:

1.) Visit https://www.mozilla.org/en-US/firefox/developer/
2.) Resize the browser window to the smallest width available (around 320px).
3.) Scroll down the page to the "Page Inspector" thumbnail video.
4.) Click the video play thumbnail.

Expected results:

A modal should be display and play the video.

Actual Results:

The page is rendered entirely white. Video can be heard when it starts to play, but is inaccessible and rendered outside of the visible viewport.

A closer inspection shows these properties are set at small viewports:

https://github.com/mozilla/bedrock/blob/master/media/css/base/mozilla-modal.less#L134-L144

It looks like setting overflow:hidden; and height: 100%; on the html element is causing issues in Firefox 47, resulting in the content being rendered outside of the visible viewport.

This same issue is not present in Firefox 45 (current release version). In Firefox 45, the viewport automatically readjusts to the top of the page when the above CSS properties are set. In Firefox 47 (dev edition), the viewport no longer adjusts in the same way.

This issue does not seem to be present in any other browsers tested.
OS: Unspecified → All
Hardware: Unspecified → All
Version: unspecified → 47 Branch
Blocks: 1261017
I can reproduce with e10s, but not in non-e10s. Additionally, turning off apz (with e10s) fixes the issue.
Summary: Setting overflow:hidden; and height: 100%; on html/body element causes issues if scrolled down the page. → [e10s] Setting overflow:hidden; and height: 100%; on html/body element causes issues if scrolled down the page.
User Agent: Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160424030601

I have tested your issue on latest Nightly build (48.0a1) and managed to reproduce it. I have followed the STR from comment 0 and is also reproducible on Windows 7 x32 only with e10s enabled. With e10s disabled, this bug no longer occurs.
tracking-e10s: --- → ?
This looks pretty bad, I can reproduce by playing one of the video and resizing to a smaller window.  Seems to have something to do with apz?
Flags: needinfo?(bugmail.mozilla)
Can repro, will look into it.
Assignee: nobody → bugmail.mozilla
Component: Layout: View Rendering → Panning and Zooming
Flags: needinfo?(bugmail.mozilla)
Keywords: correctness
Whiteboard: [gfx-noted]
Looks like something funny going on with frame reconstruction. The scroll position is changed to (0, 7.05) but the scroll origin is nsGkAtoms::restore. So when we do the paint and ship over the frame metrics to the APZ, it doesn't have the "update scroll offset" flag on it, because "restore" origins aren't allowed to clobber APZ scroll position. Need to look into why a "restore" scroll position is actually changing the scroll position.
So the "restore" scroll position is getting clamped at [1] which is why it's changing the scroll position. Things look like this (on a slightly reduced version of the test page that I made):

1) user scrolls down to (0, 4568) or whatever the scroll position is of the video
2) user clicks on the video link, which triggers a frame reconstruction with a shortened scrollable rect
3) The (0, 4568) scroll position is saved, the frame is reconstructed, and (0, 4568) is restored into mRestorePos.
4) ScrollToRestoredPosition() is called, which tries to scroll to (0, 4568) but it gets clamped to (0, 16). This change in scroll position doesn't get updated in APZ. I fixed it locally by detecting the clamping and changing the scroll origin to nsGkAtoms::other. However another fix might be to just to re-clamp the scroll position on the APZC side in NotifyLayersUpdated if the scrollable rect changes. That's probably a more robust fix.

So that should take care of the problem when the video popup comes up. It doesn't fix the problem when the video popup is dismissed and the reverse problem occurs, because now the main thread goes back to (0, 4568) but the APZC is left at (0, 16). Clamping on the APZC side won't fix that. I traced through the main-thread code for this and it's a bit more involved.

- When step (4) above happens, mLastPos gets left at (0, 16) after we "restore" the scroll position. There's code that that's supposed to clear mRestorePos once the current scroll position is successfully set to it, but that never gets invoked because of the clamping. So mRestorePos is left "dangling" at (0, 4568) while the current pos and mLastPos are both (0, 16).
- The user then dismisses the Youtube video and another frame reconstruction happens.
- The "smart" code at [2] saves (0, 4568) instead of (0, 16), and that's what gets restored and scrolled to with an origin of nsGkAtoms::restore, so it doesn't get sent to APZC. APZC is therefore left at (0, 16) while the displayport is way down at (0, 4568), which results in another "blank screen" issue.

The fundamental issue here seems to be that there's a bunch of code which holds on to the restore position until either the user manually scrolls or the restore position has been restored. And that code is coming into play in this situation where the scroll position is getting artificially clamped to (0, 16) for an extended period of time. I'm trying to figure out if this is correct or incorrect behaviour on the main thread and how to update APZ to deal with it.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp?rev=583cdf2d877c#2176
[2] http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp?rev=583cdf2d877c#5646
So comparing Chrome and Firefox release (i.e non-APZ), there is a difference in behaviour - in Firefox Release, because of the above-described behaviour, the scroll position goes back to where the video scroll position after you dismiss the video. In Chrome it doesn't, and stays at the top of the page. In general I think that Chrome's behaviour is more correct, because opening the video and then dismissing it are two separate actions. The first should set the scroll position to (0,16) and the second has no reason to put it back to what it was before. It's just a bug in Firefox. So I think the fix for this belongs in the main-thread code, and should involve entering the code path at [1] at the "right time". I'm just not sure how to trigger that without also breaking the restoration of scroll position during incremental page load.
Here's a simple test case that demonstrates the bug in release Firefox. If you scroll down and click on the button, the scroll position will go to 0 and then back to the bottom of the page in Firefox. In Chrome it stays at the top of the page. The Firefox behaviour is basically "leaking to content" whether or not a frame reconstruction happens; if I modify the styles slightly to avoid the reconstruction then the Firefox behaviour matches the Chrome behaviour.

Markus/tn: do you agree that this test case demonstrates a bug in release Firefox? And if so, do you know if there's some way to differentiate in the code between "we can't scroll to mRestorePos because we're still incrementally loading and don't have that content yet" and "we can't scroll to mRestorePos because we're done loading and the content just isn't long enough". I think in the code these two scenarios are basically identical but you guys might know some way to distinguish between them.
Flags: needinfo?(tnikkel)
Flags: needinfo?(mstange)
Also for posterity my reduced version of the original testcase is at http://people.mozilla.org/~kgupta/bug/1266833/
Timothy is a better person ta answer that question than I am. All I have to offer is that I think this shouldn't be an APZ-specific bug and that I'm not happy about APZ treating "restore" scroll position updates in a special way.
Flags: needinfo?(mstange)
Confirmed that the first half of this is a regression from bug 1235899. The second half seems to be a more recent regression, I think, since it didn't appear in the builds I tested around bug 1235899. I'll file a separate bug for that anyway, and see if I can track down a regression window with APZ enabled.
mozregression pointed me to bug 1251638. What I believe is happening is that prior to bug 1251638, APZ was hiding certain incarnations of bug 1268195 (such as the one that occurs on https://www.mozilla.org/en-US/firefox/developer/) by sending an extra repaint request. After bug 1251638, the extra repaint is not being sent, and so things are a little bit worse.

Anyway, the upshot of all this is that there are two bugs, one as described in comment 0 here, and one as described in bug 1268195. This one is APZ-specific, and the other is not.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> So comparing Chrome and Firefox release (i.e non-APZ), there is a difference
> in behaviour - in Firefox Release, because of the above-described behaviour,
> the scroll position goes back to where the video scroll position after you
> dismiss the video. In Chrome it doesn't, and stays at the top of the page.

I don't see this in Chrome. In both Chrome and Safari the page scrolls back down to show the thumbnail for the page inspector video for me. So I guess the page has some js to save and restore the scroll position for this situation?
(In reply to Markus Stange [:mstange] from comment #11)
> offer is that I think this shouldn't be an APZ-specific bug and that I'm not
> happy about APZ treating "restore" scroll position updates in a special way.

From bug 1235899 comment 18 it sounds like there are cases when we definitely do not want "restore" to clobber apz. But in other cases where a fling isn't active it seems like it should. Should we have a new type of scroll offset update that only clobbers if there isn't something "higher priority" happening (or happened) on the apz side?
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
> I don't see this in Chrome. In both Chrome and Safari the page scrolls back
> down to show the thumbnail for the page inspector video for me. So I guess
> the page has some js to save and restore the scroll position for this
> situation?

Assuming you did this on the URL of the bug, and not the reduced test case - did you narrow the browser window? Without that step the behaviour is completely different; the page uses some reponsive design goop.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> (In reply to Timothy Nikkel (:tnikkel) from comment #14)
> > I don't see this in Chrome. In both Chrome and Safari the page scrolls back
> > down to show the thumbnail for the page inspector video for me. So I guess
> > the page has some js to save and restore the scroll position for this
> > situation?
> 
> Assuming you did this on the URL of the bug, and not the reduced test case -
> did you narrow the browser window? Without that step the behaviour is
> completely different; the page uses some reponsive design goop.

Yes, the url of the bug, not the reduced testcase. And yes I narrowed the window. I tried on my retina screen and on an external monitor. I see the same behaviour. The page scrolls back down after closing the video in all cases for me.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Markus/tn: do you agree that this test case demonstrates a bug in release
> Firefox? And if so, do you know if there's some way to differentiate in the
> code between "we can't scroll to mRestorePos because we're still
> incrementally loading and don't have that content yet" and "we can't scroll
> to mRestorePos because we're done loading and the content just isn't long
> enough". I think in the code these two scenarios are basically identical but
> you guys might know some way to distinguish between them.

I think there are two different uses of this scroll restoration code. (1) When a scroll frame gets reconstructed, and (2) when we are restoring a page from history and want to restore the scroll position. In (1) waiting for more content to load makes no sense, because all the content is already there and we are just reconstructing the scroll frame, we're not going to get anymore content by waiting. (2) seems to be what this code is designed for.

We should figure out how to distinguish these two cases and then for (1) we should try to restore exactly once, then give up. For (2) we should continue trying to restore but maybe give up after the document is finished loading (so that the restore pos is guaranteed to get cleared)
Flags: needinfo?(tnikkel)
(In reply to Timothy Nikkel (:tnikkel) from comment #18)
> We should figure out how to distinguish these two cases

I'm not sure how to do that off the top of my head, examining the stacks for ScrollFrameHelper::RestoreState in both cases might tell us how to find out (or where to add code that lets us find out).
Comment on attachment 8746103 [details]
Simplified test case of gecko non-APZ bug

(In reply to Timothy Nikkel (:tnikkel) from comment #17)
> Yes, the url of the bug, not the reduced testcase. And yes I narrowed the
> window. I tried on my retina screen and on an external monitor. I see the
> same behaviour. The page scrolls back down after closing the video in all
> cases for me.

Sorry, my bad - I thought I tested it on the URL but maybe I didn't. The Firefox/Chrome difference I think only happens on attachment 8746103 [details], which I filed bug 1268195 for. I'll obsolete that attachment on this bug, and move this discussion over to that bug, since it's more relevant there.
Attachment #8746103 - Attachment is obsolete: true
(In reply to Timothy Nikkel (:tnikkel) from comment #15)
> From bug 1235899 comment 18 it sounds like there are cases when we
> definitely do not want "restore" to clobber apz. But in other cases where a
> fling isn't active it seems like it should. Should we have a new type of
> scroll offset update that only clobbers if there isn't something "higher
> priority" happening (or happened) on the apz side?

Seems reasonable. My initial inclination was to just do this on content side, but it's more flexible if we send it over to the APZ and let it decide what's "higher priority".

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> I fixed it locally by detecting the clamping and changing
> the scroll origin to nsGkAtoms::other. However another fix might be to just
> to re-clamp the scroll position on the APZC side in NotifyLayersUpdated if
> the scrollable rect changes. That's probably a more robust fix.

This is another option that I had considered (of just re-clamping the scroll position in NotifyLayersUpdated), but after thinking it through that's probably riskier and might have unintended side-effects. I don't have a concrete scenario in mind but I was thinking about the incremental loading case, where the APZC might get multiple NLU calls as the page is growing. The first NLU call would clamp the scroll position to a small value and it wouldn't go up as the page gets longer, because the scroll position changes on the main thread with non-clobbering "restore" origins.
Flags: needinfo?(bbermes)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #21)
> (In reply to Timothy Nikkel (:tnikkel) from comment #15)
> > From bug 1235899 comment 18 it sounds like there are cases when we
> > definitely do not want "restore" to clobber apz. But in other cases where a
> > fling isn't active it seems like it should. Should we have a new type of
> > scroll offset update that only clobbers if there isn't something "higher
> > priority" happening (or happened) on the apz side?
> 
> Seems reasonable. My initial inclination was to just do this on content
> side, but it's more flexible if we send it over to the APZ and let it decide
> what's "higher priority".

How can it be done on the content side? Only apz knows if there is a fling in progress that shouldn't be clobbered by a restore. Content knows about the restore, but not the fling.

> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> > I fixed it locally by detecting the clamping and changing
> > the scroll origin to nsGkAtoms::other. However another fix might be to just
> > to re-clamp the scroll position on the APZC side in NotifyLayersUpdated if
> > the scrollable rect changes. That's probably a more robust fix.
> 
> This is another option that I had considered (of just re-clamping the scroll
> position in NotifyLayersUpdated), but after thinking it through that's
> probably riskier and might have unintended side-effects. I don't have a
> concrete scenario in mind but I was thinking about the incremental loading
> case, where the APZC might get multiple NLU calls as the page is growing.
> The first NLU call would clamp the scroll position to a small value and it
> wouldn't go up as the page gets longer, because the scroll position changes
> on the main thread with non-clobbering "restore" origins.

I would agree, this does sound riskier.
Jim, you NI'd me to confirm blocking or not, right? 

I've tried it myself and suggest to have this block release.
Flags: needinfo?(bbermes)
Depends on: 1268195
See Also: 1268195
(In reply to Timothy Nikkel (:tnikkel) from comment #22)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #21)
> > (In reply to Timothy Nikkel (:tnikkel) from comment #15)
> > > From bug 1235899 comment 18 it sounds like there are cases when we
> > > definitely do not want "restore" to clobber apz. But in other cases where a
> > > fling isn't active it seems like it should. Should we have a new type of
> > > scroll offset update that only clobbers if there isn't something "higher
> > > priority" happening (or happened) on the apz side?
> > 
> > Seems reasonable. My initial inclination was to just do this on content
> > side, but it's more flexible if we send it over to the APZ and let it decide
> > what's "higher priority".
> 
> How can it be done on the content side? Only apz knows if there is a fling
> in progress that shouldn't be clobbered by a restore. Content knows about
> the restore, but not the fling.

I think actually in some cases we do want the fling to be clobbered by a restore, such as in this bug, where the restore is accompanied by a change in the scrollable area and the scroll position gets clamped. What I was thinking of doing on the content side was checking to see if the scroll position was getting clamped and if so, just sending over a regular scroll position update that clobbers the APZ side. In other words, I think this is the expected behaviour:

1) frame reconstruction, APZ idle -> doesn't matter if we clobber or not, since scroll pos is the same
2) frame reconstruction, APZ animating -> don't clobber (bug 1235899)
3) frame reconstruction with scroll position clamping, APZ idle -> clobber
4) frame reconstruction with scroll position clamping, APZ animating -> clobber

If we do it on the content side, we can distinguish the "scroll position clamping" cases (3 and 4) from cases 1 and 2, and send over the appropriate clobber message, which would result in the desired behaviour. If we do what you suggested and don't clobber if APZ is animating, then we get undesirable behaviour for case (4), because the animation could end with the scroll offset in an invalid position. If we want to still handle it on the APZ side and get correct behaviour, we have to send over extra state information from the content side, indicating if the scroll position got clamped.
Attached patch PatchSplinter Review
This is the patch based on my reasoning above.
Attachment #8747761 - Flags: review?(tnikkel)
Comment on attachment 8747761 [details] [diff] [review]
Patch

Your reasoning seems good to me.
Attachment #8747761 - Flags: review?(tnikkel) → review+
https://hg.mozilla.org/mozilla-central/rev/a6c2e1b2b85f
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Verified as fixed using Nightly 49.0a1 2016-05-05 under Win 7 64-bit and Mac OS X 10.9.5.
Status: RESOLVED → VERIFIED
Hey kats, do you feel comfortable uplifting this to 48 and 47?
Flags: needinfo?(bugmail.mozilla)
Considering this depends on bug 1268195 and bug 1269539 I'd be hesitant to uplift to 47. 48 is probably ok, let me do the rebase and check some stuff to see what's needed. Leaving ni on me for now.
Ok, I verified with an uplift of this patch to aurora [1] that it fixes the issue in comment 0, but upon dismissing the video you still get a blank screen (this is as expected). With the two dependencies uplifted as well [2] both issues are fixed. I'll flag this for uplift to aurora.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e41e02fd450d
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=85699434b2b8
Flags: needinfo?(bugmail.mozilla)
No longer blocks: 1261017
Comment on attachment 8747761 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: bug 1235899
[User impact if declined]: in some cases you can end up with a blank browser window that doesn't go away until the user takes some action that triggers a repaint, such as scrolling
[Describe test coverage new/current, TreeHerder]: patch includes a test
[Risks and why]: this patch is fairly small and low risk. it negates a little bit of the change in bug 1235899 which impacted too many scenarios.
[String/UUID change made/needed]: none
Attachment #8747761 - Flags: approval-mozilla-aurora?
Comment on attachment 8747761 [details] [diff] [review]
Patch

Fix for regression from 47, includes a test, let's uplift to aurora.
Attachment #8747761 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1520320
You need to log in before you can comment on or make changes to this bug.