Closed Bug 1213095 Opened 4 years ago Closed 4 years ago

Intermittent test_scroll_inactive_bug1190112.html | subframe should have scrolled

Categories

(Core :: Panning and Zooming, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: KWierso, Assigned: kats)

References

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted])

Attachments

(5 files, 1 obsolete file)

Blanket gfx-noted on APZ intermittent failures so they can be found more easily on https://brasstacks.mozilla.com/orangefactor/
Whiteboard: [gfx-noted]
Hit this with logging at https://treeherder.mozilla.org/#/jobs?repo=try&revision=e07a7b5d492a&group_state=expanded&selectedJob=15454624. From the logging it looks like if the main-thread needs to layerize a scrollable frame by setting a displayport on it, we might get a paint before the wheel event is actually processed on the APZ side, and then another paint (with a new scroll offset) after the wheel event is processed. sendWheelAndPaint however unblocks after the first paint and so if the scroll offset is read right after that it won't reflect the scroll offset update from the wheel scrolling. Bug 1210117 is likely the same issue.

Relevant log output:

10:16:26     INFO -  INPQ: started new scroll wheel block 0x9103b580 id 6 for target 0x8fc45000
10:16:26     INFO -  INPQ: scheduling main thread timeout for target 0x8fc45000
10:16:26     INFO -  APZCCH: For event at (31,543) found scrollable element 0xa3df4c90 (p@a3df4c90 id="subframe")
10:16:26     INFO -  APZCCH: 0xa3df4c90 didn't have a displayport, so setting one...
10:16:26     INFO -  APZC: Calculated displayport as (0.000000 0.000000 187.000000 700.000000) from velocity (0,0) paint time 50.000000 metrics:{ [cb=(x=0.000000, y=0.000000, w=187.000000, h=200.000000)] [sr=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [s=(0,0)] [dp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [cdp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [color=rgba(0, 0, 0, 0.000000) [dpm=(l=0.000000, t=0.000000, r=0.000000, b=0.000000)] um=0] [rcs=(w=1280.000000, h=969.000000)] [v=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [z=(ld=1.000 r=1.000 cr=1 z=1 er=1)] [u=(0 0 0)] [p=0] [i=(-1 0 0)] }

Above we see the APZCCH set a displayport to layerize the scrollable element. The displayport set is (0,0,187,700).

10:16:26     INFO -  APZCCH: At least one target got a new displayport, need to wait for refresh
10:16:26     INFO -  APZCCH: Successfully registered post-refresh observer
10:16:26     INFO -  APZC: 0x8fc2a800 sending scroll update acknowledgement with gen 37
10:16:26     INFO -  APZC: 0x8fc2a800 detected non-empty margins which probably need updating
10:16:26     INFO -  APZC: Calculated displayport as (0.000000 0.000000 187.000000 700.000000) from velocity (0,0) paint time 50.000000 metrics:{ [cb=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [sr=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [s=(0,0)] [dp=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [cdp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [color=rgba(0, 0, 0, 0.000000) [dpm=(l=-0.000000, t=-0.000000, r=0.000000, b=500.000000)] um=0] [rcs=(w=1280.000000, h=969.000000)] [v=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [z=(ld=1.000 r=1.000 cr=1 z=1 er=1)] [u=(1 0 37)] [p=13] [i=(10 15 0)] }
10:16:26     INFO -  APZC: 8fc2a800 requesting content repaint:{ [cb=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [sr=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [s=(0,0)] [dp=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [cdp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [color=rgba(0, 0, 0, 0.000000) [dpm=(l=-0.000000, t=-0.000000, r=0.000000, b=500.000000)] um=1] [rcs=(w=1280.000000, h=969.000000)] [v=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [z=(ld=1.000 r=1.000 cr=1 z=1 er=1)] [u=(1 0 37)] [p=13] [i=(10 15 0)] }

Here, the APZC code at [1] kicks in because of the non-empty displayport, and triggers a repaint with new margins (and scroll offset 0,0).

10:16:26     INFO -  INPQ: got a content response; block=6
10:16:26     INFO -  INPQ: got a target apzc; block=6 guid={ l=2, p=10, v=15 }
10:16:26     INFO -  INPQ: processing input block 0x9103b580; preventDefault 0 target 0x8fc2a800
10:16:26     INFO -  APZC: 0x8fc2a800 got a scroll-wheel with delta (0,19)
10:16:26     INFO -  APZC: 0x8fc2a800 running CancelAnimation in state 0
10:16:26     INFO -  APZC: 0x8fc2a800 changing from state 0 to 0
10:16:26     INFO -  APZC: Calculated displayport as (0.000000 -19.000000 187.000000 700.000000) from velocity (0,0) paint time 50.000000 metrics:{ [cb=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [sr=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [s=(0,19)] [dp=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [cdp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [color=rgba(0, 0, 0, 0.000000) [dpm=(l=-0.000000, t=-0.000000, r=0.000000, b=500.000000)] um=1] [rcs=(w=1280.000000, h=969.000000)] [v=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [z=(ld=1.000 r=1.000 cr=1 z=1 er=1)] [u=(1 0 37)] [p=13] [i=(10 15 0)] }
10:16:26     INFO -  APZC: 8fc2a800 requesting content repaint:{ [cb=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [sr=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [s=(0,19)] [dp=(x=0.000000, y=0.000000, w=187.000000, h=760.000000)] [cdp=(x=0.000000, y=0.000000, w=0.000000, h=0.000000)] [color=rgba(0, 0, 0, 0.000000) [dpm=(l=-0.000000, t=19.000000, r=0.000000, b=481.000000)] um=1] [rcs=(w=1280.000000, h=969.000000)] [v=(x=22.000000, y=534.400024, w=187.000000, h=200.000000)] [z=(ld=1.000 r=1.000 cr=1 z=1 er=1)] [u=(1 0 37)] [p=13] [i=(10 15 0)] }

and *then* the wheel event gets processed, updates the scroll offset, and triggers another repaint. So the spurious first repaint happens because the displayport that the main thread puts on the scrollable element is non-empty and different from the paint info that APZC computes for it. If we can either (a) make the main-thread displayport empty or (b) make the APZC compute the same paint parameters or (c) remove the displayport recomputation entirely, that will probably fix this bug.

I think (c) is not a viable option, because there are still cases where that code is useful. So either (a) or (b). I'll look at this further. 

[1] https://dxr.mozilla.org/mozilla-central/rev/4e2224c009dfedfcd95035e2fc67779567c2cdea/gfx/layers/apz/src/AsyncPanZoomController.cpp#3175
Assignee: nobody → bugmail.mozilla
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> So the spurious first repaint happens because the
> displayport that the main thread puts on the scrollable element is non-empty
> and different from the paint info that APZC computes for it. If we can
> either (a) make the main-thread displayport empty or (b) make the APZC
> compute the same paint parameters or (c) remove the displayport
> recomputation entirely, that will probably fix this bug.

Just to clarify this a bit more:
- the main thread computes reasonable margins for the displayport it sets on the element to layerize it.
- it's reasonable for the main thread to put non-empty margins there, because the element will likely start scrolling soon and we want some buffer against checkerboarding
- when the APZC gets the NotifyLayersUpdated call with non-empty margins, it doesn't (can't?) distinguish between these two cases: (1) an element that just got layerized and got margins computed from APZCCallbackHelper and (2) an element that has stale margins left over from some previous APZC or layerization. Therefore it is reasonable for the APZC to recompute margins and request a paint

And actually after looking at the sendWheelAndPaint code a bit more I'm not convinced this extra paint is the problem, because the sendWheelAndPaint code just waits for an APZ flush. The problem might be that the APZ flush doesn't wait for input events stuck in the input queue to get processed.
I made the APZ flush wait for pending input events, but that didn't fix the problem. It looks like something is calling UpdateSubFrame and clobbering the scroll offset back to 0,0 after it got moved by the input event's APZC repaint. Digging more...
Ah! So when the APZC is created with the non-empty margins, it calls RequestContentRepaint from the compositor thread. When that call gets into DispatchRepaintRequest, it has to bounce the request to the main thread. Between that happening and the request running on the main thread, the wheel event gets processed and dispatches its own repaint request directly on the main thread. So effectively the repaint requests get reordered, and so even though the scroll offsets should be (0,0) followed by (0,19), they end up begin (0,19) followed by (0,0). This seems like a legitimate bug that we should fix.
Fixing the thing in comment 18 seems to have resolved the intermittent failure. Try push at [1] to verify that, and [2] to make sure it didn't break anything else.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=faeaf229cca9
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fce5c61a536
Not really part of this fix, but I ran into while enabling the APZES_LOG, and I'm too lazy to file a new bug.
Attachment #8708842 - Flags: review?(botond)
This function is only called from the one place, and inlining it makes the future patches cleaner.
Attachment #8708843 - Flags: review?(botond)
Updates to the framemetrics inside RequestContentRepaint are hoisted out into the call sites. This is helpful for part 4. I also just dropped the presShellId updating because I never really understood the purpose of that and it seems unnecessary. If something breaks as a result we should find a proper fix for it.
Attachment #8708845 - Flags: review?(botond)
This is the main fix. By hoisting the redispatch-to-main-thread up a bit, we can make sure that we don't have the sort of race conditions described above. (To recap, the race condition happened because a copy of mFrameMetrics was captured and posted to the main thread, but then the mFrameMetrics was modified by another operation on the main thread first, and then the old copy was posted and clobbered the newer one).
Attachment #8708846 - Flags: review?(botond)
Attached patch Part 5 - Enable chaos mode (obsolete) — Splinter Review
Because why not.
Attachment #8708847 - Flags: review?(botond)
Actually we can enable chaos mode on two tests.
Attachment #8708847 - Attachment is obsolete: true
Attachment #8708847 - Flags: review?(botond)
Attachment #8708849 - Flags: review?(botond)
Attachment #8708842 - Flags: review?(botond) → review+
Attachment #8708843 - Flags: review?(botond) → review+
Comment on attachment 8708845 [details] [diff] [review]
Part 3 - Don't let RCR mutate the FrameMetrics

Review of attachment 8708845 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, the presShellId updating doesn't make sense to me either - seems like getting a new presShellId should cause us to get a new APZC. Perhaps this is a relic from the days when we weren't comparing the presShellIds in ScrollableLayerGuids.
Attachment #8708845 - Flags: review?(botond) → review+
Comment on attachment 8708846 [details] [diff] [review]
Part 4 - Main fix

Review of attachment 8708846 [details] [diff] [review]:
-----------------------------------------------------------------

It's a bit unfortunate that we can't reuse more code between the ZoomToRect() call site and the RequestContentRepaint() call site, but I guess the whole point is that in the RequestContentRepaint() case we don't read the value of mFrameMetrics until we're running on the main thread, whereas in ZoomToRect() the metrics we want to send is only available when we do the dispatch.

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ +2814,5 @@
>    return listener ? listener->GetLastTouchIdentifier() : -1;
>  }
>  
>  void AsyncPanZoomController::RequestContentRepaint() {
> +  if (!NS_IsMainThread()) {

Please add a comment saying it's important that the redispatch happens before the CalculatePendingDisplayPort() call (which we want to be looking at an up-to-date mFrameMetrics at the time it's run).
Attachment #8708846 - Flags: review?(botond) → review+
Attachment #8708849 - Flags: review?(botond) → review+
I had to rebase this over bug 1226826 which was straightforward although non-trivial, because of the changes in that bug that passed the velocity into DispatchRepaintRequest. Since DispatchRepaintRequest got inlined and mutations to the framemetrics got moved out, the velocity is now passed into RequestContentRepaint.

(In reply to Botond Ballo [:botond] from comment #28)
> > +  if (!NS_IsMainThread()) {
> 
> Please add a comment saying it's important that the redispatch happens
> before the CalculatePendingDisplayPort() call (which we want to be looking
> at an up-to-date mFrameMetrics at the time it's run).

I added this comment as well.
Depends on: 1269066
You need to log in before you can comment on or make changes to this bug.