Closed Bug 1486958 Opened Last year Closed Last year

Resizing window under high system load is worse under WebRender

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: bholley, Assigned: sotaro)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file, 3 obsolete files)

STR on 2017 Quantum Reference device:
* Load https://jsfiddle.net/159pokym/24/ in another browser (i.e. chrome)
* Click "Start Memory Worker" 4 times (the number of virtual cores on the device)
* Load a page under WebRender (I used https://www.w3.org/TR/html5/single-page )
* Resize back and forth horizontally

Under WebRender, the right side of the screen paints regions of white and black for the entire window, including the browser chrome. Without, the right side of the screen paints gray, and the browser chrome resizes smoothly.

Interestingly, Chrome behaves like WR. So this is a regression, but puts us at rough Chrome parity. The current experience in Firefox is really good though, and it would be a shame to lose.
Priority: -- → P2
It looks like disabling direct composition helps with this, on my machine at least.

Our Windows widget code schedules a composition immediately when it receives the resize (which composites the old content into the newly sized buffers, and would show the black at the edge), and then goes into painting the window at the new size.

My guess at the moment is that usually we manage to paint the chrome document and get that into a composition quick enough that the next vsync presents only the latter.

If direct composition is stopping us from discard duplicate presentation within an interval, then we'd be showing the frame with the incorrectly sized content.
My testing so far suggests that the problem only happens with DirectComposition and the flip model that has the problem. Using the flip model with a normal swap chain seems fine, as does using the bitblt model with DirectComposition.

It's possible that trying to fix the bugs we had with the flip model (bug 1435995) would be a path forward here.

I found some relevant information here: https://docs.microsoft.com/en-us/windows/desktop/api/d3d11/nf-d3d11-id3d11devicecontext-flush.

Sotaro, how can I reproduce the fallback from WR to normal being broken (with the flip model)?
Flags: needinfo?(sotaro.ikeda.g)
Whiteboard: [gfx-noted]
Setting WS_EX_NOREDIRECTIONBITMAP on the window (which is what Edge appears to be using) helps, but there still is some weird glitching happening on the top-right corner.
It looks like we switched to using DirectComposition (bug 1191971) in order to take advantage of the flip-model (DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL). The flip model saves memory bandwidth by swapping buffers instead of copying, and improves performance. 

Advanced Layers doesn't use the flip model, but instead saves memory bandwidth by taking advantage of partial composition. We do probably need at least one of these things in order to have comparable performance. In theory we could have both, but it looks like the flip model never got enabled for Advanced Layers/D3D11 (but doesn't appear to be blocked).

Bug 1435995 covers the attempt to use the flip-model with WR using a normal (non-DirectComposition) swap chain. It had to be disabled, since we weren't able to draw to the Window again if we hit fallback.

I can reproduce the failure, both when the GPU process crashes (forced from about:support), and by faking a WR error result from wr_renderer_render (the recovered state is visually corrupted for a while in the 'good' state, but eventually works).

The DirectComposition landing also added a new OS-level Window (HWND) object, owned by the GPU process. If I keep this enabled, but use a non-DirectComposition flip-model swap chain on it, then everything appears to work correctly.

Resizing performance looks good, we should be getting the memory bandwidth reduction of the flip-model, and the fallback works correctly.

The only obvious downside I've found so far is that when the GPU process crashes, the child HWND gets deleted, so the browser displays blank white while we restart the process (~1 second). When rendering to the root HWND (owned by the chrome process), or DirectComposition, we manage to keep displaying the old content while we restart (though obviously it's non responsive).

That's a bit of a tradeoff, but I think it's worth it. In theory we could have the chrome process draw something to the root HWND as a placeholder to make it a bit nicer.
I did some testing, and took videos:

Firefox: FxWR, FxWR + comment 5, Fx: https://photos.app.goo.gl/tAXjEZ7gdpgBkNKC8
Chrome: https://photos.app.goo.gl/pCP6kNqQG4a1JXDD7

My subjective impression is that the patch in comment 5 improves things over stock WebRender, and reaches Chrome parity, but isn't as good as stock Firefox. A lot of this has to do with the fill color being black/white instead of Gray, and gating browser-chrome rendering on content rendering. Document splitting should fix that, but we probably don't want to block on that for the MVP.
This bug seems to be triggered by Bug 1476876. Even with direct composition, it seems that the problem could be addressed to same level to Comment 5. But it could regress talos performance like Bug 1380462.
Flags: needinfo?(sotaro.ikeda.g)
Assignee: nobody → sotaro.ikeda.g
See Also: → 1380462
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> This bug seems to be triggered by Bug 1476876. Even with direct composition,
> it seems that the problem could be addressed to same level to Comment 5. But
> it could regress talos performance like Bug 1380462.

We're already paying that cost with normal gecko though right? It seems like it should be possible to get equivalent performance (even if that's not perfect) with WR.
(In reply to Matt Woodrow (:mattwoodrow) from comment #10)
> (In reply to Sotaro Ikeda [:sotaro] from comment #7)
> > This bug seems to be triggered by Bug 1476876. Even with direct composition,
> > it seems that the problem could be addressed to same level to Comment 5. But
> > it could regress talos performance like Bug 1380462.
> 
> We're already paying that cost with normal gecko though right? It seems like
> it should be possible to get equivalent performance (even if that's not
> perfect) with WR.

With WebRender, gecko pays more time for sync FlushRendering() because of webrender multiple threading model.
The Windows widget code knows if we're resizing, can we use that and just do sync for Resizes?

Not being sync during startup/new window seems nice, since the main thread can keep doing other things.
(In reply to Matt Woodrow (:mattwoodrow) from comment #12)
> The Windows widget code knows if we're resizing, can we use that and just do
> sync for Resizes?

I am going to check if it is possible.
Attachment #9008311 - Flags: review?(matt.woodrow)
Comment on attachment 9008311 [details] [diff] [review]
patch - sync FlushRendering() only during resizing on windows

Needs to update the patch.
Attachment #9008311 - Flags: review?(matt.woodrow)
Attachment #9007977 - Attachment is obsolete: true
Attachment #9008311 - Attachment is obsolete: true
Attachment #9008609 - Flags: review?(matt.woodrow)
Comment on attachment 9008609 [details] [diff] [review]
patch - sync FlushRendering() only during resizing on windows

Review of attachment 9008609 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/wr/WebRenderLayerManager.cpp
@@ -637,5 @@
>    }
>    MOZ_ASSERT(mWidget);
>  
> -  // When DirectComposition and compositor window are used, we do not need to do sync FlushRendering.
> -  if (WrBridge()->GetCompositorUseDComp()) {

It might be safest to only make this change for DComp.

So make this if (DComp && !resizing) { async } else if ...

I don't think we should change ClientLayerManager for now. Alternatively, change them, but in a separate patch. That way we can easily back out for regressions without affecting what we need for WR.
Attachment #9008609 - Flags: review?(matt.woodrow) → review+
Comment on attachment 9008609 [details] [diff] [review]
patch - sync FlushRendering() only during resizing on windows

Review of attachment 9008609 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/wr/WebRenderLayerManager.cpp
@@ -637,5 @@
>    }
>    MOZ_ASSERT(mWidget);
>  
> -  // When DirectComposition and compositor window are used, we do not need to do sync FlushRendering.
> -  if (WrBridge()->GetCompositorUseDComp()) {

For ClientLayerManager, I am going to split to different bug.
Apply the comment.
Attachment #9008609 - Attachment is obsolete: true
Attachment #9008940 - Flags: review+
(In reply to Sotaro Ikeda [:sotaro] from comment #23)
> talos with attachment 9008940 [details] [diff] [review] 
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=6681672964d9fd43916cf85ae9736388
> 5d51df30&newProject=try&newRevision=f557c7c210dfce40aabb1fc75db8e021f35d0044&
> framework=1

It looks OK!
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca89cc5c6a17
Sync FlushRendering() only during resizing on windows r=mattwoodrow
https://hg.mozilla.org/mozilla-central/rev/ca89cc5c6a17
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
See Also: → 1496253
You need to log in before you can comment on or make changes to this bug.