Closed Bug 1286179 Opened 8 years ago Closed 8 years ago

[e10s][APZ] Scroll position on some pages is lost after reload(F5) if e10s was enabled

Categories

(Core :: Layout, defect, P2)

47 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla51
Tracking Status
e10s - ---
firefox47 --- wontfix
firefox48 + wontfix
firefox49 + verified
firefox50 + verified
firefox51 --- verified

People

(Reporter: alice0775, Assigned: kats)

References

Details

(Keywords: regression)

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #1269539 +++

Problem does not seem to occur if e10s was disabled.

Steps To Reproduce:
1. Ensure e10s enabled
2. Open http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32/
3. Scroll to the bottom of the page
4. Reload (Ff5)
Actual Results:
Scroll position is lost. Scroll position is restored at random.

Expected Results:
Scroll position should be restored.
OS: Unspecified → Windows 10
Hardware: Unspecified → x86
Depends on: 1269539
regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b5e15ecccb6cd9e454e69f44e2d7e0e41a29361c&tochange=04710089fa641574fb12e6fb5aa24a28568a7e99

Regressed by: Bug 1235899


Indeed, setting layers.async-pan-zoom.enabled to false fixes the problem
Blocks: 1235899
Flags: needinfo?(bugmail)
Flags: needinfo?(botond)
Keywords: regression
Summary: [e10s] Scroll position lost after reload(F5) if e10s was enabled → [e10s][APZ] Scroll position lost after reload(F5) if e10s was enabled
[Tracking Requested - why for this release]:
Even weirder, after the F5 the scroll thumb's position is still as if you were at the end of the page (even though you're not), and when you scroll again it "jumps" to become in sync with the scroll position (which is somewhere in the middle of the page).
I saw this once on Aurora this morning and now I can't reproduce it. Must be intermittent. I'll make a try build with a patch that might fix the problem, based on a discussion I was having with Randall yesterday.
The discussion I was having with Randall was unrelated, I got confused about which bug created this regression. Botond was able to make the test from bug 1269539 fail using the (much longer) page in comment 0, and I was able to reproduce that as well. I'm investigating.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
Flags: needinfo?(botond)
OS: Windows 10 → All
Hardware: x86 → All
Version: Trunk → 47 Branch
Summary: [e10s][APZ] Scroll position lost after reload(F5) if e10s was enabled → [e10s][APZ] Scroll position on very long pages is lost after reload(F5) if e10s was enabled
Attached file Stack fragment —
Attached is a fragment of the stack from when the "load" event is fired to content. It goes through nsDocumentViewer::LoadComplete and even calls PresShell::LoadComplete at [1] but this doesn't actually seem to invoke a call to ScrollToRestoredPosition().

PresShell::EndLoad(), which *does* call ScrollToRestoredPosition(), is called earlier while the page is still actually loading (i.e. the max scroll offset is still small), and doesn't get called again after the page is actually done (the max scroll offset is now what it should be).

tn, do you know what codepath is supposed to trigger a call to ScrollToRestoredPosition after nsDocumentViewer::LoadComplete sets mLoaded (per bug 1269539 comment 1)? Can I just add a new call to RestoreRootScrollPosition() in PresShell::LoadComplete()?

[1] http://searchfox.org/mozilla-central/rev/c44d0b1673fef5e0e2e19fa82d6780a74c186151/layout/base/nsDocumentViewer.cpp#1033
Flags: needinfo?(tnikkel)
Actually ignore that. Further investigation seems to indicate that something else is going on, perhaps the the load is going on for so long that the APZ has enough time to send a repaint request and clobber the scroll. I'm digging in a bit more.
Flags: needinfo?(tnikkel)
Ok, I think I got a handle on what's going on. While the page is reloading, we are doing periodic layouts. This involves:
a) The code in ScrollToRestoredPosition running, which calls ScrollToWithOrigin(... nsGkAtoms::restore ...) [1]
b) ScrollToWithOrigin "promoting" the origin from nsGkAtoms::restore to nsGkAtoms::other [2] because the destination scroll position hasn't yet been reached.
c) Periodically these get flushed over to the compositor and since the scroll origin is nsGkAtoms::other, we tag the metrics as having scroll offset updated [3].
d) When APZ receives a scroll offset update, it sends back a repaint request to make sure the displayport margins and stuff are all good [4].
e) By the time those repaint requests make their way back to the main thread, their scroll generation is usually stale (because the main thread has done another update in the meantime), which means the call at [5] fails to clear the scroll origin on the scrollframe (which is still nsGkAtoms::other), and consequently [6] returns true which makes the repaint request not try to set a stale scroll position.

This is all well and good. Even if in step (e) the main thread hasn't done another update, and the APZ repaint request does set the scroll position, it's going to be the same as the old scroll position and becomes a no-op.

The problem occurs once the nsGfxScrollFrame reaches the target scroll position that it's trying to restore. Once it does, the condition at [2] fails, because the scroll position is no longer clamped, and so the nsGkAtoms::restore doesn't get promoted to nsGkAtoms::other. If, after this point, an inflight APZ repaint request arrives (step e), it will clobber the main-thread scroll position. This happens even though the repaint request has a stale generation, because the main-thread origin is nsGkAtoms::restore and so [6] returns false.

I have a patch which updates the condition at [2] to disallow demoting an existing non-clobberable scroll origin to nsGkAtoms::restore, which *is* clobberable. That seems to fix the problem for me on the test case.

[1] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/layout/generic/nsGfxScrollFrame.cpp#4115
[2] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/layout/generic/nsGfxScrollFrame.cpp#2184
[3] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/layout/base/nsLayoutUtils.cpp#8903
[4] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/gfx/layers/apz/src/AsyncPanZoomController.cpp#3388
[5] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/gfx/layers/apz/util/APZCCallbackHelper.cpp#127
[6] http://searchfox.org/mozilla-central/rev/92616e983b8ad6e99ec148f341e146c3c6fa312a/gfx/layers/apz/util/APZCCallbackHelper.cpp#99
[Tracking Requested - why for this release]:
apz is on in 48.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c79af2c931b

That one's looking good. Cleaned up patches coming.
Per comment #11, track this on 48/49/50.
Patch looks safe enough. Kats: how do you feel about a 48 uplift?
Flags: needinfo?(bugmail)
I would prefer to avoid uplifting to 48 this late in the beta cycle. The patch is relatively small, but the code paths here are not that well understood (by me, at least) and we've had a number of bugs in this area. I wouldn't be surprised if this change regressed something. I'd be ok with uplifting to 49 but would rather not uplift to 48.
Flags: needinfo?(bugmail)
Hi Kartikaya,
Can you create a uplift request for 49?
Flags: needinfo?(bugmail)
Marking 'wontfix' for 48 per comment 17.
(In reply to Gerry Chang [:gchang] from comment #18)
> Hi Kartikaya,
> Can you create a uplift request for 49?

Yes, once it lands.
Flags: needinfo?(bugmail)
Summary: [e10s][APZ] Scroll position on very long pages is lost after reload(F5) if e10s was enabled → [e10s][APZ] Scroll position on some pages is lost after reload(F5) if e10s was enabled
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> I have a patch which updates the condition at [2] to disallow demoting an
> existing non-clobberable scroll origin to nsGkAtoms::restore, which *is*
> clobberable. That seems to fix the problem for me on the test case.

I'm not sure if this makes intuitive sense to me. What would make sense to me what would to change the condition in ScrollFrameHelper::ScrollToWithOrigin to whether we changed the scroll position of the scroll frame. ie if the scroll is a restore scroll and it moves the scroll position, then it is an important one that shouldn't be clobbered. I don't think the clamping is the important part, its the fact that we moved the scroll frame. Does that make sense? And fix the bug? And not regress anything else? Or am I missing something?
Just a quick update: I did a try push with your suggestion, and it's failing a bunch of scrolling-related tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b5b2dd8f887&selectedJob=24390196

I haven't had a chance to dig into the failures yet and probably won't until Thursday. However I might just try doing the other approach we discussed where I propagate more information to APZ and move all the clobber logic into APZCCallbackHelper; I'm not sure it's worth the time to wrap my head around these failures.
Priority: -- → P2
I wrote a new patch to address this problem, more in line with what we discussed but still sort of an incremental change so that it's safer to uplift. The way the new patch works is that it flags the APZ repaint requests as being either triggered by a main-thread scroll update or by a user action. The repaint requests that are triggered the main-thread scroll update will not try to do the ScrollTo call.

In terms of the explanation in comment 8, step (e) is modified so that it doesn't actually try to set the scroll position on the nsGfxScrollFrame at all. This makes it impossible for it to clobber the updated main-thread scroll offset. The patch fixes this bug, and I'm planning to make additional incremental changes to tackle bug 1292613.

Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=72d3ff97d333
Try push seems ok. Reviews coming up.
Comment on attachment 8771056 [details]
Bug 1286179 - Update an existing test to catch this new instance of the bug.

I can't review this in mozreview because of bug 1297545.
Attachment #8771056 - Flags: review?(tnikkel) → review+
Comment on attachment 8771057 [details]
Bug 1286179 - For APZ repaint requests that are triggered by main-thread updates, don't attempt to re-scroll the main thread.

https://reviewboard.mozilla.org/r/64316/#review71546

This is much easier to follow!
Attachment #8771057 - Flags: review?(tnikkel) → review+
Comment on attachment 8771056 [details]
Bug 1286179 - Update an existing test to catch this new instance of the bug.

https://reviewboard.mozilla.org/r/64314/#review71548
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/60b4b2b76e0c
Update an existing test to catch this new instance of the bug. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/7f7fdb130473
For APZ repaint requests that are triggered by main-thread updates, don't attempt to re-scroll the main thread. r=tnikkel
https://hg.mozilla.org/mozilla-central/rev/60b4b2b76e0c
https://hg.mozilla.org/mozilla-central/rev/7f7fdb130473
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment on attachment 8771056 [details]
Bug 1286179 - Update an existing test to catch this new instance of the bug.

Approval Request Comment
[Feature/regressing bug #]: bug 1235899 (part of APZ work)
[User impact if declined]: in some cases reloading the page doesn't restore the scroll position properly. Generally limited to very long pages. Note that this patch also touches code related to bug 1292781, which is needed to fix the 49-blocker bug 1292613. Although not a strict dependency, uplifting this will make uplifting bug 1292781 easier and less risky.
[Describe test coverage new/current, TreeHerder]: Updated an automated test to cover this case
[Risks and why]: There's some risk because this code has been tweaked a number of times, as we keep finding new scenarios it didn't cover. However we also have an increasing amount of test coverage for this code so I think it should ok to uplift after some QA cycles to verify it on m-c.
[String/UUID change made/needed]: none
Attachment #8771056 - Flags: approval-mozilla-beta?
Attachment #8771056 - Flags: approval-mozilla-aurora?
Attachment #8771057 - Flags: approval-mozilla-beta?
Attachment #8771057 - Flags: approval-mozilla-aurora?
kats - We don't have a lot of time before shipping 49 - building beta 7 tomorrow, then 8 and 9 next week, then the release candidate. Thanks for pointing out the relation to bug 129613. Do you have any specific suggestions for manual testing here?

I am inclined to uplift now (or tomorrow morning depending on manual tests of the fix) for beta 7.  Then we can aim for the related uplifts on Monday for beta 8. 

Florin - Can your team do some testing today around scroll position and reloading pages on m-c?
Flags: needinfo?(florin.mezei)
Flags: needinfo?(bugmail)
The manual testing I would recommend is mostly just making sure the scroll position is restored properly on reload and history navigation. Also testing the STR in bug 1235899 would be good.
Flags: needinfo?(bugmail)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #34)
> Florin - Can your team do some testing today around scroll position and
> reloading pages on m-c?

We'll make sure this gets looked at, as soon as possible.
Flags: needinfo?(florin.mezei) → needinfo?(andrei.vaida)
QA Contact: petruta.rasa
I could still reproduce this using the link from comment 0 and the test page from the duplicate using Nightly 51.0a1 2016-08-25 under Mac OS X 10.9.5 and Ubuntu 14.04 64-bit.
This seems only fixed under Win 10 64-bit.

I'll continue testing with the STR from bug 1235899.
Flags: needinfo?(andrei.vaida)
Thank you Petruta! 
kats, is that helpful? Should we still uplift as is to take care of Win10 cases? As long as we aren't making things *worse* for Mac and Ubuntu it may still be a good idea.
Flags: needinfo?(bugmail)
(In reply to Petruta Rasa [QA] [:petruta] from comment #37)
> I could still reproduce this using the link from comment 0 and the test page
> from the duplicate using Nightly 51.0a1 2016-08-25 under Mac OS X 10.9.5 and
> Ubuntu 14.04 64-bit.
> This seems only fixed under Win 10 64-bit.
> 
> I'll continue testing with the STR from bug 1235899.

Can you also please check with a build that contains bug 1292781? I tried just now and while I was able to intermittently repro what you saw on the Aug 25 nightly, I don't see it at all on the inbound build from the other bug. I think the patch from that bug may have fixed the rest of this. I also applied both bugs onto beta tip and did a try build, at https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b242d2df8ee&group_state=expanded - it would be worth testing on that as well to see if it happens.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #38)
> Thank you Petruta! 
> kats, is that helpful? Should we still uplift as is to take care of Win10
> cases? As long as we aren't making things *worse* for Mac and Ubuntu it may
> still be a good idea.

I agree that as long as we're not making things worse it's still worth uplifting. I'd like to give Petruta a bit more time to see if any regressions turn up with more testing.
Flags: needinfo?(bugmail)
The scroll position is restored on the two test pages using the try build (same OS's as in comment 37 except for Linux where I used 14.04 32-bit). So for those pages the try build fixes the issues.

However, I noticed that on Twitter (eg https://twitter.com/search?q=rio+2016&partner=Firefox&source=desktop-search) the scroll position is not kept after the first or the second reloading of the page. This happens on try build and on latest Nightly, but it doesn't reproduce using Firefox 46 RC.
(In reply to Petruta Rasa [QA] [:petruta] from comment #41)
> However, I noticed that on Twitter (eg
> https://twitter.com/search?q=rio+2016&partner=Firefox&source=desktop-search)
> the scroll position is not kept after the first or the second reloading of
> the page. This happens on try build and on latest Nightly, but it doesn't
> reproduce using Firefox 46 RC.

I discussed this with Petruta on IRC, and I was able to reproduce this bug but only under very specific circumstances (you have to be scrolling down such that twitter is loading more content, and then reload while that's happening). I'm fairly confident that's a more general APZ regression and not specifically from this bug - the problem is that while twitter is loading more things, any extra scroll that happens will be in the compositor and doesn't get synced back to the main thread right away. So when reloading we just restore to what the main thread knew as the last scroll position. It's not something that's very severe, I think, and would be be pretty hard to fix given the nature of APZ.
Comment on attachment 8771056 [details]
Bug 1286179 - Update an existing test to catch this new instance of the bug.

Given this feedback from testing and from kats, let's uplift both patches; it may make beta 7 but more likely beta 8 next week.
Attachment #8771056 - Flags: approval-mozilla-beta?
Attachment #8771056 - Flags: approval-mozilla-beta+
Attachment #8771056 - Flags: approval-mozilla-aurora?
Attachment #8771056 - Flags: approval-mozilla-aurora+
Attachment #8771057 - Flags: approval-mozilla-beta?
Attachment #8771057 - Flags: approval-mozilla-beta+
Attachment #8771057 - Flags: approval-mozilla-aurora?
Attachment #8771057 - Flags: approval-mozilla-aurora+
Verified as fixed using Firefox 49 RC, latest Aurora 50.0a2 and latest Nightly 51.0a1 under Win 10 64-bit, Ubuntu 16.04 64-bit and Mac OS X 10.9.5.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: