Closed Bug 1292781 Opened 3 years ago Closed 3 years ago

[e10s] Treeherder scroll position back to the top after loading new results if HWA was disabled

Categories

(Core :: Layout, defect, P2)

48 Branch
x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
e10s + ---
firefox48 --- wontfix
firefox49 --- fixed
firefox-esr45 --- unaffected
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: alice0775, Assigned: kats)

References

Details

(Keywords: regression)

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1292613 +++

The problem does occur if e10s was disabled.

Steps to reproduce:
 1. Go to https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound
 2. Scroll down all the way.
 3. Click the "10" button to load more results.
 4. Wait until the new results have loaded.

Actual results:
Treeherder scroll position back to the top. And Flashes once.

Expected results:
Treeherder scroll position should be kept.

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=11c3335d77fb751128c456daa204385d69678484&tochange=4de1094b41b27db10a5b0e5683ba4272c505ee9c

Suspect: Bug 1255968
See Also: → 1292613
Summary: Treeherder scroll position back to the top after loading new results if HWA was disabled → [e10s] Treeherder scroll position back to the top after loading new results if HWA was disabled
Flags: needinfo?(janus926)
Priority: -- → P2
Checked e10s disabled version on Linux, nsWindow::HasPendingInputEvent() does not return true when click the "10" button, however nsPuppetWidget::HasPendingInputEvent() does because of event Msg_SynthMouseMoveEvent__ID when e10s is enabled.
Flags: needinfo?(janus926)
Not quite sure what's the purpose of synth mouse event, but it's not something will be checked in the other native widgets' HasPendingInputEvent(). Probably I should just remove it.
(In reply to Ting-Yu Chou [:ting] from comment #1)
> Checked e10s disabled version on Linux, nsWindow::HasPendingInputEvent()
> does not return true when click the "10" button, however
> nsPuppetWidget::HasPendingInputEvent() does because of event
> Msg_SynthMouseMoveEvent__ID when e10s is enabled.

I was wrong, it doesn't have to be SynthMouseMoveEvent. After I removed the event from nsPuppetWidget::HasPendingInputEvent, I can still repro if I move the mouse around after pressing the "10" button. Also when e10s is disabled, if I do the same, I can see the screen back to the top, and then goes down to the original position again.
When e10s is enabled, after pressing "10" ScrollToWithOrigin() will be called in content process with zero x and y from APZChild::RecvUpdateFrame():

#0  mozilla::ScrollFrameHelper::ScrollToWithOrigin (
    this=this@entry=0x7effb5445260, aScrollPosition=..., 
    aMode=aMode@entry=nsIScrollableFrame::INSTANT, 
    aOrigin=aOrigin@entry=0x7effc81b8180, aRange=aRange@entry=0x7ffc4e5f8800, 
    aSnap=aSnap@entry=nsIScrollbarMediator::DISABLE_SNAP)
    at /home/ting/w/fx/mc/layout/generic/nsGfxScrollFrame.cpp:2118
#1  0x00007effe1a4d3e5 in mozilla::ScrollFrameHelper::ScrollToCSSPixelsApproximate (this=0x7effb5445260, aScrollPosition=..., aOrigin=0x7effc81b8180)
    at /home/ting/w/fx/mc/layout/generic/nsGfxScrollFrame.cpp:2097
#2  0x00007effe0a1c895 in ScrollFrameTo (aSuccessOut=<synthetic pointer>, 
    aPoint=<synthetic pointer>, aFrame=0x7effb5445248)
    at /home/ting/w/fx/mc/gfx/layers/apz/util/APZCCallbackHelper.cpp:101
#3  mozilla::layers::ScrollFrame (aContent=aContent@entry=0x7eff77cb5980, 
    aMetrics=...)
    at /home/ting/w/fx/mc/gfx/layers/apz/util/APZCCallbackHelper.cpp:132
#4  0x00007effe0a1ca78 in mozilla::layers::APZCCallbackHelper::UpdateSubFrame (
    aMetrics=...)
    at /home/ting/w/fx/mc/gfx/layers/apz/util/APZCCallbackHelper.cpp:286
#5  0x00007effe16a53ec in mozilla::dom::TabChildBase::UpdateFrameHandler (
    this=<optimized out>, aFrameMetrics=...)
    at /home/ting/w/fx/mc/dom/ipc/TabChild.cpp:252
#6  0x00007effe048dfd6 in mozilla::layers::PAPZChild::OnMessageReceived (
    this=0x7eff73ebc100, msg__=...)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/ipc/ipdl/PAPZChild.cpp:387

In parent process, SendUpdateFrame() is called with following stack:

#0  mozilla::layers::PAPZParent::SendUpdateFrame (this=0x7fffba55d178, 
    frame=...)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/ipc/ipdl/PAPZParent.cpp:62
#1  0x00007fffeb7a08a2 in mozilla::layers::AsyncPanZoomController::RequestContentRepaint (this=this@entry=0x7fffc4394000, aFrameMetrics=..., aVelocity=...)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/AsyncPanZoomController.cpp:2912
#2  0x00007fffeb7a0a9e in mozilla::layers::AsyncPanZoomController::RequestContentRepaint (this=0x7fffc4394000)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/AsyncPanZoomController.cpp:2846
#3  0x00007fffeb7b18d7 in applyImpl<mozilla::layers::AsyncPanZoomController, void (mozilla::layers::AsyncPanZoomController::*)()> (args=..., 
    m=<optimized out>, o=<optimized out>)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:729
#4  apply<mozilla::layers::AsyncPanZoomController, void (mozilla::layers::AsyncPanZoomController::*)()> (m=<optimized out>, o=<optimized out>, 
    this=<optimized out>)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:736
#5  mozilla::detail::RunnableMethodImpl<void (mozilla::layers::AsyncPanZoomController::*)(), true, false>::Run (this=<optimized out>)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:764
#6  0x00007fffeae7e239 in nsThread::ProcessNextEvent (this=0x7ffff6bb5050, 
    aMayWait=<optimized out>, aResult=0x7fffffffc70f)
    at /home/ting/w/fx/mc/xpcom/threads/nsThread.cpp:1058

Note if PuppetWidget::HasPendingInputEvent() returns false, RequestContentRepaint() will return earlier from the if clause [1].

[1] https://dxr.mozilla.org/mozilla-central/rev/6e191a55c3d23e83e6a2e72e4e80c1dc21516493/gfx/layers/apz/src/AsyncPanZoomController.cpp#2882
mFrameMetrics.mScrollOffset is (0, 0) for the AsyncPanZoomController, do you know why it's not changed when I scroll down to the bottom on treeherder?
Flags: needinfo?(bugmail)
(In reply to Ting-Yu Chou [:ting] from comment #5)
> mFrameMetrics.mScrollOffset is (0, 0) for the AsyncPanZoomController, do you
> know why it's not changed when I scroll down to the bottom on treeherder?

When you scroll TH, it's a div element with id th-global-content that actually scrolls. The document.scrollingElement.scrollTop remains zero, as that's not the element that actually scrolls. Most likely you were looking at the APZC for the root element when you should be looking at the APZC for the div.
Flags: needinfo?(bugmail)
See Also: → 1286179
(In reply to Alice0775 White from comment #0)
> Suspect: Bug 1255968

I can confirm that backing out bug 1255968 locally makes this problem go away for me.
The APZC for the div has correct mFrameMetrics.mScrollOffset.y when scroll down to the bottom, but after pressing "10", it will be set to 0 by stack:

#0  CopyScrollInfoFrom (aOther=..., this=0x7fff719d0858)
    at /home/ting/w/fx/mc/gfx/layers/FrameMetrics.h:232
#1  mozilla::layers::AsyncPanZoomController::NotifyLayersUpdated (
    this=this@entry=0x7fff719d0800, aScrollMetadata=..., 
    aIsFirstPaint=aIsFirstPaint@entry=false, 
    aThisLayerTreeUpdated=<optimized out>)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/AsyncPanZoomController.cpp:3408
#2  0x00007fffeb7ac543 in mozilla::layers::APZCTreeManager::PrepareNodeForLayer
    (this=this@entry=0x7fffc26c7c00, aLayer=..., aMetrics=..., 
    aLayersId=aLayersId@entry=4, aAncestorTransform=..., 
    aParent=aParent@entry=0x7fffbcabb280, aNextSibling=0x0, aState=...)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:463
#3  0x00007fffeb7aca44 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=this@entry=0x7fffc26c7c00, aState=..., aLayer=..., 
    aLayersId=aLayersId@entry=4, aAncestorTransform=..., 
    aParent=aParent@entry=0x7fffbcabb280, aNextSibling=0x0)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:571
#4  0x00007fffeb7acb67 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=this@entry=0x7fffc26c7c00, aState=..., aLayer=..., 
    aLayersId=aLayersId@entry=4, aAncestorTransform=..., 
    aParent=0x7fffbcabb280, aParent@entry=0x7fffc2609a80, aNextSibling=0x0)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:602
#5  0x00007fffeb7acb67 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=this@entry=0x7fffc26c7c00, aState=..., aLayer=..., 
    aLayersId=aLayersId@entry=4, aAncestorTransform=..., 
    aParent=0x7fffc2609a80, aParent@entry=0x7fffc2609940, aNextSibling=0x0)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:602
#6  0x00007fffeb7acb67 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=this@entry=0x7fffc26c7c00, aState=..., aLayer=..., 
    aLayersId=aLayersId@entry=1, aAncestorTransform=..., 
    aParent=0x7fffc2609940, aParent@entry=0x7fffc2609800, aNextSibling=0x0)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:602
#7  0x00007fffeb7acb67 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=this@entry=0x7fffc26c7c00, aState=..., aLayer=..., 
    aLayersId=<optimized out>, aAncestorTransform=..., aParent=0x7fffc2609800, 
    aParent@entry=0x0, aNextSibling=0x0)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:602
#8  0x00007fffeb7acd3d in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=0x7fffc26c7c00, aCompositor=aCompositor@entry=0x7fffc2636000, 
    aRoot=0x7fffc1204800, aIsFirstPaint=aIsFirstPaint@entry=false, 
    aOriginatingLayersId=aOriginatingLayersId@entry=4, 
    aPaintSequenceNumber=aPaintSequenceNumber@entry=102)
    at /home/ting/w/fx/mc/gfx/layers/apz/src/APZCTreeManager.cpp:182
#9  0x00007fffeb7f8791 in mozilla::layers::CompositorBridgeParent::NotifyShadowTreeTransaction (this=0x7fffc2636000, aId=aId@entry=4, 
    aIsFirstPaint=aIsFirstPaint@entry=false, 
    aScheduleComposite=aScheduleComposite@entry=true, 
    aPaintSequenceNumber=aPaintSequenceNumber@entry=102, 
    aIsRepeatTransaction=aIsRepeatTransaction@entry=false, aHitTestUpdate=true)
    at /home/ting/w/fx/mc/gfx/layers/ipc/CompositorBridgeParent.cpp:1114
#10 0x00007fffeb8004ec in mozilla::layers::CrossProcessCompositorBridgeParent::ShadowLayersUpdated (this=0x7fff70453800, aLayerTree=0x7fff6fd297c0, 
    aTransactionId=@0x7fffc97fe748: 101, aTargetConfig=..., aPlugins=..., 
    aIsFirstPaint=<optimized out>, aScheduleComposite=true, 
    aPaintSequenceNumber=102, aIsRepeatTransaction=false, aHitTestUpdate=true)
    at /home/ting/w/fx/mc/gfx/layers/ipc/CompositorBridgeParent.cpp:2513
#11 0x00007fffeb811848 in mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArray<mozilla::layers::Edit>&&, nsTArray<mozilla::layers::OpDestroy>&&, unsigned long const&, unsigned long const&, mozilla::layers::TargetConfig const&, nsTArray<mozilla::layers::PluginWindowData>&&, bool const&, bool const&, unsigned int const&, bool const&, mozilla::TimeStamp const&, int const&, nsTArray<mozilla::layers::EditReply>*) (this=0x7fff6fd297c0, cset=<optimized out>, 
    aToDestroy=<optimized out>, aFwdTransactionId=<optimized out>, 
    aTransactionId=<optimized out>, targetConfig=..., 
    aPlugins=<unknown type in /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so, CU 0x803fa9e, DIE 0x8213acb>, 
    isFirstPaint=@0x7fffc97fe71d: false, 
    scheduleComposite=@0x7fffc97fe71e: true, 
    paintSequenceNumber=@0x7fffc97fe720: 102, 
    isRepeatTransaction=@0x7fffc97fe71f: false, aTransactionStart=..., 
    aPaintSyncId=@0x7fffc97fe724: 0, reply=0x7fffc97fe760)
    at /home/ting/w/fx/mc/gfx/layers/ipc/LayerTransactionParent.cpp:661
#12 0x00007fffeb2e62f4 in mozilla::layers::PLayerTransactionParent::OnMessageReceived (this=0x7fff6fd297c0, msg__=..., reply__=@0x7fffc97fea10: 0x0)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/ipc/ipdl/PLayerTransactionParent.cpp:766
#13 0x00007fffeb4ac65f in mozilla::layers::PCompositorBridgeParent::OnMessageReceived (this=<optimized out>, msg__=..., reply__=@0x7fffc97fea10: 0x0)
    at /home/ting/w/fx/mc/obj-x86_64-pc-linux-gnu/ipc/ipdl/PCompositorBridgeParent.cpp:1213
I haven't sorted it out completely, but probably you want to consider this case as well for the patch of bug 1286179. The restored scroll position is set in mScrollOffset, but |scrollOffsetUpdated| is false because both |mScrollUpdateType| and |mScrollGeneration| [1] are 0, so it is not copied [2].

[1] https://dxr.mozilla.org/mozilla-central/rev/054d4856cea6150a6638e5daf7913713281af97d/gfx/layers/apz/src/AsyncPanZoomController.cpp#3290
[2] https://dxr.mozilla.org/mozilla-central/rev/054d4856cea6150a6638e5daf7913713281af97d/gfx/layers/apz/src/AsyncPanZoomController.cpp#3379
Flags: needinfo?(bugmail)
When ScrollFrameHelper::ScrollToRestoredPosition() gets called to restore the scroll position, ScrollFrameHelper::ScrollToWithOrigin() does not overwrite |aOrigin| to nsGkAtoms::other because |mDestination| equals |aScrollPosition|. That's why CanScrollOriginClobberApz() returns false and not calling SetScrollOffsetUpdated().
Right after a reflow gets interrupted, if the scrollable frame's scroll position is not restored and a refresh driver tick comes in. The scroll offset (0,0) will be updated to APZC because the renewed nsHTMLScrollFrame still has initialized value nsGkAtoms::other in its |mHelper.mLastScrollOrigin|, which CanScrollOriginClobberApz() will return true.

And the reason why the following restoration does not work is comment 10.

When e10s is disabled (./mach run --disable-e10s), APZC::NotifyayersUpdated() is not called.
Thanks for chasing this down. I was planning on rewriting a bunch of this code anyway since it's so brittle, I'll take this bug and make sure that my rewrite fixes this as well.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
Great, thank you. :)
Blocks: 1292613
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> Thanks for chasing this down. I was planning on rewriting a bunch of this
> code anyway since it's so brittle, I'll take this bug and make sure that my
> rewrite fixes this as well.

If the hope is that this also fixes bug 1292613, the uplift to beta (49) would be needed, so lets keep that in mind with the 'rewrite'.
Attached patch PatchSplinter Review
So the problem here (as Ting mostly outlined already) is that there is an interrupted reflow at a very inopportune time, right in the middle of a frame reconstruction. So the scrollframe's state is saved, it is destroyed and recreated, the state is restored. Normally as part of the same tick the layout code will call ScrollToRestoredPosition() on the scrollframe, but in this case that doesn't happen. Instead there's a layers update that gets pushed out, and then some time later we get the ScrollToRestoredPosition() call.

What that means is that when the layers update goes out, it sends a scroll update to the APZ with a position of (0,0). The APZ dutifully updates to that scroll position. This behaviour is actually correct, because it would be showing garbage if it remained at the other scroll offset. This also happens with APZ turned off; the page "jumps" to the 0,0 position briefly, and then when ScrollToRestoredPosition() is called it jumps back. In the APZ case, when the ScrollToRestoredPosition() is called, it sets the scroll origin to nsGkAtoms::restore, and since that's not allowed to clobber the APZ scroll offset [1], the APZ never even hears about this position change. And so we end up with the main thread and APZ scroll offsets in different places.

As I mentioned in bug 1292613 comment 7, on Linux and Windows we somehow get out of this state because of an APZ repaint request, which puts us back at (0,0). On OS X we don't, and we end up in a perma-checkerboard state where you can't even scroll because the APZ hit-test fails. That's a separate issue that I'll tackle later.

The fix I have for this bug ensures that even scroll updates with origin nsGkAtoms::restore get sent over to the APZ, and then the APZ can decide whether to apply them or not based on whether or not the user has scrolled in the meantime. This is equivalent to the main-thread code in ScrollToRestoredPosition(), which tries to restore the scroll position but aborts if the user manually scrolled in the interim.

Patch is attached, but I'm not sure how to write a test that replicates this interrupted reflow. I also need to do a try push to make sure it doesn't break anything else. Note that this patch applies on top of the one from bug 1286179 but can probably be rebased without if needed, since the two are conceptually independent bugs.

[1] http://searchfox.org/mozilla-central/rev/ae78ab94fadabc89fc6258d03c4a1a70f763f43a/layout/base/nsLayoutUtils.cpp#8870
I wrote a test too. Had to add a new DOMWindowUtils API to force a reflow interrupt, but it seems to work. Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=b114df712ff8
Comment on attachment 8782580 [details] [diff] [review]
Patch

Need to iron out an intermittent failure in the test, but in the meantime the patch should be ready for review.
Attachment #8782580 - Flags: review?(tnikkel)
Attachment #8782580 - Flags: review?(tnikkel) → review+
Attachment #8783547 - Flags: review?(tnikkel) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/72d0178222e5
Send scroll-position-restore updates to APZ, but don't allow them to clobber user scrolls. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/df8d6a12b605
Add a test. r=tnikkel
https://hg.mozilla.org/mozilla-central/rev/72d0178222e5
https://hg.mozilla.org/mozilla-central/rev/df8d6a12b605
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment on attachment 8782580 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: bug 1255968 technically, but it just exposed an underlying bug in the APZ code
[User impact if declined]: In some cases changes on the page that trigger an interruptible reflow can leave the page scrolled to the wrong position. Even worse, this can result in a perma-checkerboarding state and inability to scroll until the user tab-switches away and back. Bug 1292613 (tagged as a 49 blocker) is an example of this.
[Describe test coverage new/current, TreeHerder]: Added an automated test to cover this scenario
[Risks and why]: Same as with bug 1286179: there's some risk because this code has been tweaked a number of times, as we keep finding new scenarios it didn't cover. However we also have an increasing amount of test coverage for this code so I think it should ok to uplift after some QA cycles to verify it on m-c.
[String/UUID change made/needed]: none
Attachment #8782580 - Flags: approval-mozilla-beta?
Attachment #8782580 - Flags: approval-mozilla-aurora?
Attachment #8783547 - Flags: approval-mozilla-beta?
Attachment #8783547 - Flags: approval-mozilla-aurora?
I also kicked off a try build with the patches from bug 1286179 and this bug applied on top of beta, so that we can verify the patches work after uplift as well.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b242d2df8ee
Comment on attachment 8782580 [details] [diff] [review]
Patch

Let's uplift this and test on beta. This should help not only with the treeherder issues but with a release blocking scrolling/checkerboard bug.
Attachment #8782580 - Flags: approval-mozilla-beta?
Attachment #8782580 - Flags: approval-mozilla-beta+
Attachment #8782580 - Flags: approval-mozilla-aurora?
Attachment #8782580 - Flags: approval-mozilla-aurora+
Comment on attachment 8782580 [details] [diff] [review]
Patch

Actually let's hold off here until we have a better solution in bug 1286179. kats is working on it. We may be able to come up with a fix for beta 8 next monday.
Attachment #8782580 - Flags: approval-mozilla-beta?
Attachment #8782580 - Flags: approval-mozilla-beta+
Attachment #8782580 - Flags: approval-mozilla-aurora?
Attachment #8782580 - Flags: approval-mozilla-aurora+
Comment on attachment 8782580 [details] [diff] [review]
Patch

We can try uplift now for this scrolling fix as the patches in 1286179 are now ready for uplift too.
Attachment #8782580 - Flags: approval-mozilla-beta?
Attachment #8782580 - Flags: approval-mozilla-beta+
Attachment #8782580 - Flags: approval-mozilla-aurora?
Attachment #8782580 - Flags: approval-mozilla-aurora+
Attachment #8783547 - Flags: approval-mozilla-beta?
Attachment #8783547 - Flags: approval-mozilla-beta+
Attachment #8783547 - Flags: approval-mozilla-aurora?
Attachment #8783547 - Flags: approval-mozilla-aurora+
See Also: → 1303649
You need to log in before you can comment on or make changes to this bug.