Netflix skip frames with layers.mlgpu.enabled=true on Intel GPUs on Win10

RESOLVED FIXED in Firefox 57

Status

()

Core
Graphics: Layers
P1
normal
RESOLVED FIXED
a month ago
16 days ago

People

(Reporter: cpearce, Assigned: dvander)

Tracking

57 Branch
mozilla58
x86
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 unaffected, firefox57blocking fixed, firefox58 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(2 attachments)

(Reporter)

Description

a month ago
I noticed that Netflix is intermittently dropping frames in the compositor on my Dell XPS15/9550 laptop in Firefox Nightly on Windows 10 on the integrated Intel GPU. I don't think I see it with the NVIDIA GPU (which is blacklisted), but I get pretty bad tearing with that (blacklisted) GPU, so it's harder to tell.

I see this in Beta 57, and in Nightly 58. I'm running 64bit Firefox, e10s enabled.

This bug manifests as video only painting about half of the frames per second that it's supposed to; camera pans and movements don't look smooth, they appear to shudder. I don't always see this consistently, but it's quite noticeable when it's happening.

I mozregressioned this, and I believe this is a regression caused by bug 1375743. I can make the problem go away if I turn off layers.mlgpu.dev-enabled in Nightly.

Note, my XPS15 has a 4K screen, if that matters, and this is easier to see in fullscreen mode.

If need be, I can provide a Netflix test account to observe this on; just ping me on IRC and I can provide the login details.
(Reporter)

Comment 1

a month ago
Milan, David: can someone look into this? We don't want to ship Quantum with Netflix not looking good.
Flags: needinfo?(milan)
Flags: needinfo?(dvander)
[Tracking Requested - why for this release]: makes netflix unwatchable sometimes
tracking-firefox57: --- → ?
See Also: → bug 1402643
(Reporter)

Updated

a month ago
status-firefox57: --- → affected
status-firefox58: --- → affected
(Assignee)

Comment 3

a month ago
Is there a specific video on Netflix that consistently does this, or is it literally anything?
Flags: needinfo?(dvander) → needinfo?(cpearce)
(Reporter)

Comment 4

a month ago
(In reply to David Anderson [:dvander] from comment #3)
> Is there a specific video on Netflix that consistently does this, or is it
> literally anything?

I see it on pretty much any Netflix video.
Flags: needinfo?(cpearce)
Summary: Netflix skip frames with layers.mlgpu.dev-enabled=true on Intel GPUs on Win10 → Netflix skip frames with layers.mlgpu.enabled=true on Intel GPUs on Win10
status-firefox56: --- → unaffected
tracking-firefox57: ? → +
Bas is away this week, see if David has time.
Assignee: nobody → dvander
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
(Assignee)

Comment 6

a month ago
Is the only indication a visual effect? Or does some other metric we can measure jump as well? For example, the FPS counter if you use layers.acceleration.draw-fps?

Does it matter whether Netflix is windowed or full-screen?
Flags: needinfo?(cpearce)
(Reporter)

Comment 7

a month ago
(In reply to David Anderson [:dvander] from comment #6)
> Is the only indication a visual effect? Or does some other metric we can
> measure jump as well? For example, the FPS counter if you use
> layers.acceleration.draw-fps?

The compositor does not report the video frames it drops to VideoPlaybackQuality metrics.

The only indication is a visual effect; the frame drop is noticeable as stuttery motion.

With layers.acceleration.draw-fps=true I see us rendering about 50 fps in full screen mode. Interestingly, if I set layers.mlgpu.enabled=false, I see us rendering about 24/30 fps in fullscreen mode, i.e. at without mlgpu we render at roughly the same frame rate as the video. I'd assume that's more power efficient.

> Does it matter whether Netflix is windowed or full-screen?

This happens in both fullscreen and non-fullscreen video.
Flags: needinfo?(cpearce)
(Assignee)

Comment 8

a month ago
I can suddenly reproduce this - investigating.
Status: NEW → ASSIGNED
(Assignee)

Comment 9

a month ago
It looks to me as though two things are wrong here. One is that, as you noted, AL is compositing more frames than needed. I'm counting two composites per frame. The other problem is much more serious, sometimes AL is selecting an Image off the ImageLayer that should be composited about a second in the future. The result is the video appears to pull back for just a frame, making it look super janky.
See Also: → bug 1403598
Milan suggested we track this as a blocker for 57.
tracking-firefox57: + → blocking
(In reply to David Anderson [:dvander] from comment #9)
> It looks to me as though two things are wrong here. One is that, as you
> noted, AL is compositing more frames than needed. I'm counting two
> composites per frame. The other problem is much more serious, sometimes AL
> is selecting an Image off the ImageLayer that should be composited about a
> second in the future. The result is the video appears to pull back for just
> a frame, making it look super janky.

The later problem could explain the intermittent out of order compositing I reported in bug 1402643,but
(Assignee)

Comment 12

a month ago
The first bug is an invalidation bug. The old Compositor clones the layer tree after compositing - Advanced Layers does it before since I figured the properties were immutable during compositing. Turns out ImageHost::mLastFrameID is updated during compositing, so we should be cloning the layer properties after.
(Assignee)

Comment 13

a month ago
Created attachment 8920345 [details] [diff] [review]
part 1, refactor regions

So barring some kind of gross hack, AL should be caching invalidation properties after compositing, not before. However AL clobbers all over the shadow visible region for occlusion testing, and AL explicitly wants this to not affect invalidation.

The easiest solution is probably to store the occluded regions in a separate field that won't interfere with LayerTreeInvalidation.

This patch refactors all the Set/GetShadowVisibleRegion calls to be Get/SetRenderRegion instead, and stores the new region on LayerMLGPU.

There are still a few calls to Get/Set[Shadow/Local]VisibleRegion as necessary. They all happen before occlusion testing.
Attachment #8920345 - Flags: review?(matt.woodrow)
(Assignee)

Comment 14

a month ago
Created attachment 8920347 [details] [diff] [review]
part 2, clone layer properties later

This clones layer properties after compositing, so the correct FrameID is seen during invalidation. This fixes the framerate, and for me, seems to fix the jank bug as well (every messed up frame was always an "extra" frame).

I don't understand the interaction between media and the compositor well enough to know why this also fixes the jank bug. I'll do a try run shortly so others can confirm this.
Attachment #8920347 - Flags: review?(matt.woodrow)
(Assignee)

Comment 15

a month ago
When a Windows build is available below can anyone confirm if the problem is fixed?

https://tools.taskcluster.net/index/artifacts/gecko.v2.try.revision.b417c279a1a12ab08499e8f53200af5ee2a4669b
Flags: needinfo?(cpearce)
(Assignee)

Updated

a month ago
Flags: needinfo?(jyavenard)
Attachment #8920345 - Flags: review?(matt.woodrow) → review+
Attachment #8920347 - Flags: review?(matt.woodrow) → review+
I won't be able to test on Windows until Monday. I didn't experience the issue cpearce had, and the out of order issue is very intermittent.
Will report as soon as I'm able. 

By jank however, are you referring to jitter or apparent out of order composition?
(Assignee)

Comment 17

a month ago
Either one, really. Whatever was attributable to layers.mlgpu.enabled = true.

Comment 18

a month ago
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd03410adaba
Store the AL render region separately from the shadow visible region. (bug 1408781 part 1, r=mattwoodrow)
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d2c317e3936
Cache invalidation properties after compositing, not before. (bug 1408781 part 2, r=mattwoodrow)
Backed out for build bustage at gfx/layers/mlgpu/PaintedLayerMLGPU.h:62: 'SetRenderRegion' overrides a member function but is not marked 'override':

https://hg.mozilla.org/integration/mozilla-inbound/rev/7f0e5af74534edde720068b1a802175810f1ab47
https://hg.mozilla.org/integration/mozilla-inbound/rev/6975e103f7535ac9eb830b4a6a10e5b0a5d3cc20

Build log (most build jobs are busted by a different issue, but they got retriggered and show up as failures soon): https://treeherder.mozilla.org/logviewer.html#?job_id=138539170&repo=mozilla-inbound

[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -  In file included from /builds/worker/workspace/build/src/obj-firefox/gfx/layers/Unified_cpp_gfx_layers9.cpp:65:
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -  In file included from /builds/worker/workspace/build/src/gfx/layers/mlgpu/LayerManagerMLGPU.cpp:8:
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -  /builds/worker/workspace/build/src/gfx/layers/mlgpu/PaintedLayerMLGPU.h:62:8: error: 'SetRenderRegion' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -    void SetRenderRegion(LayerIntRegion&& aRegion);
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -         ^
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -  /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/layers/LayerMLGPU.h:89:16: note: overridden virtual function is here
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -    virtual void SetRenderRegion(LayerIntRegion&& aRegion);
[task 2017-10-20T17:45:44.262Z] 17:45:44     INFO -                 ^
Flags: needinfo?(dvander)
(Reporter)

Comment 20

a month ago
Note for Netflix to work, you need the "Windows 2012 tc Bs" jobs' target.zip, for example this build off inbound:
https://public-artifacts.taskcluster.net/BQtTJC0BTZaYoYvx5qaBHQ/0/public/build/target.zip

I could not repro the issue in that build.
Flags: needinfo?(cpearce)

Comment 21

a month ago
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/97aeedf3aa5d
Store the AL render region separately from the shadow visible region. (bug 1408781 part 1, r=mattwoodrow)
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bf42ef768d8
Cache invalidation properties after compositing, not before. (bug 1408781 part 2, r=mattwoodrow)
https://hg.mozilla.org/mozilla-central/rev/97aeedf3aa5d
https://hg.mozilla.org/mozilla-central/rev/5bf42ef768d8
Status: ASSIGNED → RESOLVED
Last Resolved: a month ago
status-firefox58: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
(Assignee)

Comment 23

a month ago
Comment on attachment 8920345 [details] [diff] [review]
part 1, refactor regions

This applies to both patches in this bug.

Approval Request Comment
[Feature/Bug causing the regression]: Advanced Layers
[User impact if declined]: Garbage frames when watching Netflix
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: Not very, but it could have consequences.
[Why is the change risky/not risky?]: This changes invalidation, which is normally not something I like uplifting since it's not well covered by tests. The worst it can do is rendering glitches when the page changes. But the bug here is serious and there's not really any other fix. It's better to see if it holds up earlier rather than later.
[String changes made/needed]:
Flags: needinfo?(dvander)
Attachment #8920345 - Flags: approval-mozilla-beta?
Comment on attachment 8920345 [details] [diff] [review]
part 1, refactor regions

Completely agree with the risk assessment, the sooner we bake this on beta57 the better.
Attachment #8920345 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/a8e9602f18f6
https://hg.mozilla.org/releases/mozilla-beta/rev/aadb38c31b0c
status-firefox57: affected → fixed
Flags: needinfo?(jyavenard)
Blocks: 1406439

Updated

22 days ago
Depends on: 1412078
(Assignee)

Updated

16 days ago
No longer depends on: 1412078
You need to log in before you can comment on or make changes to this bug.