Closed Bug 1415272 Opened 7 years ago Closed 6 years ago

WebRender bad on Mozilla Hacks APZ parallax demo page

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- disabled
firefox56 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox60 --- disabled
firefox61 --- disabled
firefox62 --- disabled
firefox63 --- disabled
firefox64 --- fixed

People

(Reporter: jhugman, Assigned: kats)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [wr-reserve][retest])

Attachments

(2 files)

STR: 

1. Run Nightly
2. Enable WebRender as per https://mozillagfx.wordpress.com/2017/11/06/webrender-newsletter-9/
3. Read Belen's https://hacks.mozilla.org/2017/11/async-panzoom-apz-lands-in-firefox-quantum/
4. Navigate to parallax demo https://belen-albeza.github.io/scroll-demos/parallax.html

Expected results:
Being amazed at how smooth the experience feels. 

Observed results: 
Janky scrolling, with flickering layers.
Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
Nightly 58 x64 20171108110838 de_DE f63559d7e6a570e4e73ba367964099394248e18d @ Debian Testing (KDE, Radeon RX480)
A) fresh profile: layers.acceleration.force-enabled, gfx.webrender.enabled
When scrolling, the content jumps up and down for some millimeter. And when Payun-payun has left the viewport on the top it shows up on the bottom again and scrolls up.

B) main profile: gpu process, layers force accel, webrender, blob-images, omtp, stylo-chrome
The page loaded very slow when visiting for the first time and was just blank. I suspected blob-images, but could reproduce it only once.

(In reply to James Hugman [:jhugman] [@jhugman] from comment #0)
Which OS and graphics card do you use?
See Also: → 1415609
See Also: → 1415610
Priority: -- → P2
Whiteboard: [wr-mvp] [triage] → [wr-mvp]
Sierra
MacBook Pro (Retina, 15-inch, Mid 2015)
AMD Radeon R9 M370X 2048 MB
Intel Iris Pro 1536 MB
I think the flicker problem is a APZ bug on nsDisplayPerspective. I didn't see the flicker after I disabled APZ.
Attached video 2017-11-23_16-12-34.mp4
Nightly 59 x64 20171123100420 de_DE b6bed1b710c3e22cab49f22f1b5f44d80286bcb9 @ Debian Testing (KDE, Radeon RX480)
fresh profile: layers.acceleration.force-enabled, gfx.webrender.enabled, gfx.webrender.blob-images, layers.async-pan-zoom.enabled;false

Disabling APZ suppressed that up-and-down shaking.

1. Open https://belen-albeza.github.io/scroll-demos/parallax.html by right click > Open in New Tab, so that you don't have an clickable back button.
2. Scroll down slowly.

Visual issues:
1. Something orange slips up behind the back button
2. "Payun-payun" is visible for a short moment on the bottom
3. The orange plane/spaceship is visible on the left for a moment
Assignee: nobody → nical.bugzilla
Status: NEW → ASSIGNED
Priority: P2 → P1
Whiteboard: [wr-mvp] → [wr-reserve]
Priority: P2 → P1
Summary: WebRender janky on Mozilla Hacks APZ parallax demo page → WebRender bad on Mozilla Hacks APZ parallax demo page
This might be related to nested perspective transforms. Removing the transform on the parallax-wrapper div that contains the other transformed elements seems to fix the issue.
Probably a part of the transform gets reapplied for each nested nsDisplayPerspective and is then passed to APZ. The display list itself before APZ kicks in appears to position things correctly in any case.
I don't think nical is actively working on this. Note also that the layout/reftests/async-scrolling/perspective-scrolling-3.html reftest is still marked as fails-if so it's quite plausible that our perspective handling is not quite solid enough. Fixing that reftest might be easier than trying to debug the full webpage, and fixing that might fix this.
Assignee: nical.bugzilla → nobody
Status: ASSIGNED → NEW
There seem to be a number of issues on the page, so I isolated one problem and investigated a little. Filed servo/webrender#2876 because the behaviour I'm seeing doesn't seem explainable by stuff on the gecko side, I suspect it's a WR issue.
Assignee: nobody → bugmail
Depends on: wr-3d
See Also: → 1480369
Taking this over until kats is back

hey glenn, how does this look with your 3d-transforms-fixed branch?
Assignee: kats → a.beingessner
Flags: needinfo?(gwatson)
It looks *almost* correct - the scrolling is smooth, and the parallax looks correct. There is still an occasional flicker during scrolling.
Flags: needinfo?(gwatson)
(In reply to Glenn Watson [:gw] from comment #9)
> It looks *almost* correct - the scrolling is smooth, and the parallax looks
> correct. There is still an occasional flicker during scrolling.

If it looks this good after Glenn lands his 3d-transform fix, then this no longer blocks pref'ing on in Nightly.  It would block release (move to stage-wr-trains, make it a P2).  I'd like to verify this though before we move it.  Alexis - can I ask you to verify this new behavior once Glenn's work lands?  Thanks, both!
Flags: needinfo?(a.beingessner)
Whiteboard: [wr-reserve] → [wr-reserve][retest]
There's definitely still something weird going on with this page - I'm not sure if it's 3d transform related or not. My guess is that it's something else (perhaps property animation related or something like that).

Either way, I don't think it justifies being a blocker for nightly (once https://github.com/servo/webrender/pull/3030 lands anyway).
I would be comfortable bumping this to trains-P1. It's a pretty egregious rendering bug, but not one on many pages.
Depends on: 1489937
Flags: needinfo?(a.beingessner)
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Hmm, so the page looks really good now, but the scrolling janks and jumps. It's possible we can ship that, or at least push to beta?
Flags: needinfo?(mreavy)
Investigated this a bit more, it looks like APZ and perspective are having a bad interaction, perhaps? Specifically if I scroll fast enough I think APZ gets "ahead" of the real position, and then snaps back?

Possibly a floating point precision issue?
Thanks, Alexis.  Definitely this doesn't need to block going to Beta.  The big question for Release is how bad is this compared to non-WR?
Flags: needinfo?(mreavy)
Priority: P1 → P2
non-wr is buttery smooth, for reference
mattwoodrow rather thinks that we're probably feeding perspective into the APZ infra incorrectly, leading to APZ computing a different velocity than layout, so the snaps are presumably layout "refreshing" our position to be correct.

Specifically matt is suspicious of https://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#8789 which has us setting aLayerData->SetTransformIsPerspective(true) on every nsDisplayTransform inside a perspective, where apparently vanilla gecko only sets it on the nsDisplayPerspective.

Some naive attempts at fixes didn't work; need to investigate deeper.

kats, since this code is your expertese, does this seem like something you'd like to look at when you get back to work?
Flags: needinfo?(kats)
Perspective and 3D stuff is not my favorite thing but sure, I can look at it when I'm back next week (26th will be my first day back, fyi).
Flags: needinfo?(kats)
Depends on: 1494685
Assignee: a.beingessner → kats
So after some poking around I've concluded that this is very likely the same problem that's causing layout/reftests/async-scrolling/perspective-scrolling-3.html to be failing. APZ is emitting the same async transform with or without WR, the problem is that it's not getting combined correctly with the other transforms in the ancestor elements. Most likely there's an assumption mismatch about the semantics of the scroll translation sent from gecko to WR. I'm going to use the reftest to debug this further since it's simpler and easier to get a wr-capture in the bad state and dive into the transforms.
Flipping the post_translate at [1] to a pre_translate seems to fix it. I had a moment of clarity where this change made sense but then it all fell out of my head. I'll have to spend some more time understanding the coordinate spaces and convincing myself this is the right thing to do (or convince myself that it's wrong).

[1] https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/gfx/webrender/src/spatial_node.rs#264
The try push with that change is green too: https://treeherder.mozilla.org/#/jobs?repo=try&bugfiler=&group_state=expanded&revision=fd156919846cdc951c47d94e2c8df7b0df64fa47

But I still haven't convinced myself it's right. In fact I'm pretty close to convincing myself it's wrong :(
Ok, I convinced myself that the patch is correct. To first convince myself that the existing code is wrong, I used the test page at [1] which is a modified version of perspective-scrolling-3.html. There's two divs inside the root scrollframe, the left one has a translateZ(-1px) scale(2) while the right one does not. Since the left one is "farther away but bigger" it looks the same initially, but moves less slowly when scrolling.

From a display list point of view, both of these boxes are inside the same scrollframe. The left one creates a reference frame because of the transform, whereas the right one does not, but we can imagine it as also creating a reference frame with an identity transform. And applying a scroll offset to the scrollframe should somehow cause the "transform to world coordinate space" for the two boxes to be different.

Now, look at the transforms that get applied for a reference frame at [2] to convert its bounds to the world coordinate space. These are, in order:
 - info.source_perspective
 - info.source_transform
 - info.origin_in_parent_reference_frame
 - state.parent_accumulated_scroll_offset
 - state.parent_reference_frame_transform

In this list, the first three transforms are going to be different for the two boxes, while the remaining two are going to be the same. This implies that the position of two boxes relative to each other is defined by the difference in the first three boxes, and when scrolling, the two boxes will always be in the same position relative to each other. This is not the desired behaviour, as we concluded above.

Convincing myself that my patch is correct required realizing that the first three transforms (which are combined into info.relative_transform) actually need to be applied at the boundary of parent reference frame, rather than being thought of as "local properties" of the current reference frame. In other words, info.relative_transform is what is needed to convert the current reference frame into the coordinate space matching that of the parent reference frame. So, when converting to the "world space", it must be applied *after* transforms from scrollframes that are inside the parent reference frame. If the current reference frame is R, the parent reference frame is P, world space is W, and we have a scrollframe between the R and P, then the walk to the world space from R looks like this:

R -> S -> P -> W
     |    |    +--> state.parent_reference_frame_transform
     |    +-------> info.relative_transform
     +------------> state.parent_accumulated_scroll_offset

which corresponds to the ordering:
 - state.parent_accumulated_scroll_offset
 - info.source_perspective
 - info.source_transform
 - info.origin_in_parent_reference_frame
 - state.parent_reference_frame_transform

which is what my patch does.

[1] https://mozilla.staktrace.com/bug/1415272.html
[2] https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/gfx/webrender/src/spatial_node.rs#253-268
This will need to land with the WR update that includes servo/webrender#3154
Attachment #9013693 - Flags: review?(jmuizelaar)
Attachment #9013693 - Flags: review?(jmuizelaar) → review+
Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1eb163b766fd
Enable perspective-scrolling-3 as it passes with WR PR 3154. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/1eb163b766fd
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Backed out (both the WR PR and the test annotation) for causing bug 1496416 and dupes:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4bcce93eae6401679746dda3cd7c192aba62f4f5

The WR PR is getting backed out upstream in servo/webrender#3165 as well
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I spent a good chunk of yesterday trying to figure out why my solution was wrong.

I think the first part of what I said in comment 23 (i.e. why the existing code is wrong) still makes sense. But the second part (why the new patch is correct) didn't make sense when I thought about in terms of 2D transforms. It seems pretty clear that you want to apply the 2D transform first, then do the scrolling translation, otherwise you end up changing the origin that the 2D transform is applied relative to. And in fact that's exactly what happened in bug 1496416 and dupes, resulting in weird behaviours for non-translation 2D transforms.

So then I spent some more time trying to picture all this in my head. A few times I came up with what I thought was the right solution but wasn't. Finally I resorted to numerical techniques to end up with the matrix I wanted, and ended up with a solution where I apply a basis-change operation on the perspective matrix. That seems to work in my local testing, but I can't claim to have any intuition as to why.

Try push with that fix: https://treeherder.mozilla.org/#/jobs?repo=try&bugfiler=&group_state=expanded&revision=b5b03f9334ff26c20354a0f967c04a232c6fd30c
QA Contact: mreavy
As before, we'll have to land the patch on this bug (to remove the fails-if) when the PR gets merged to gecko.
QA Contact: mreavy
Depends on: 1497768
No longer depends on: 1497768
Should be fixed with the WR update into m-c.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: