Closed Bug 1416650 Opened 7 years ago Closed 6 years ago

Bad performance on running tpaint talos suite

Categories

(Core :: Graphics: WebRender, defect, P1)

x86_64
All
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: vliu, Assigned: mattwoodrow)

References

Details

(Whiteboard: [gfx-noted])

See [1] for the Comparison link in detail. It showed that

tpaint opt e10s in Linux x64 opt: 47.79%
tpaint opt e10s stylo_disabled in Linux x64 opt: 30.52%

[1]: Comparison link in Linux x64 opt
     https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=226893bbeb3e&newProject=try&newRevision=624d8ef848c2d76eed6a2854e03e3c54950ffd12&framework=1
Blocks: 1416635
Whiteboard: [gfx-noted]
https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1646250,1,1&series=autoland,1683322,1,1&series=autoland,1651465,1,1&series=autoland,1684580,1,1

Linux and Windows are both worse with WR enabled than without. Bumping to P1 because of Windows regression.
Component: Graphics: Layers → Graphics: WebRender
OS: Linux → All
Priority: P3 → P1
Summary: Bad performance on running tpaint talos suite in Linux x64 opt → Bad performance on running tpaint talos suite
This got worse on windows10-64-qr because of bug 1471962
Blocks: 1471962
Bug 1476876 helped on Windows but it's still worse with WR.
Depends on: 1476876
Assignee: nobody → a.beingessner
We can't release this to the field, but we can let this ride to beta.  However, we want to investigate and understand the bad perf numbers asap.
Priority: P1 → P2
Whiteboard: [gfx-noted] → [gfx-noted][needs-investigation]
Assignee: a.beingessner → nobody
Assignee: nobody → matt.woodrow
It looks like most of the slowness is compiling shaders: https://perfht.ml/2QhOSh9

Dan has a patch to try cache some of these which takes the score from 284 -> 206 (Gecko is ~120).

Waiting on a try run with a new profile to see what the remainder is.
New profile with Dan's patch: https://perfht.ml/2Mgxma0

Better, but still a 130ms composite, with all the time spent within glProgramBinary.

We should be able to cache shader compilation between windows, and I think that would be sufficient to fix the regression here.

I did some digging in to this, and it looks we create a separate GLContext for each window, which we pass to WR. Internally within ANGLE though, we share the same underlying D3D11 device and all of the EGLWindow structures (including Renderer11) should be common across windows.

I had a few ideas of how we can fix this, approximately in order of ascending difficulty and quality.

* Keep a cache of the shader objects created within Renderer11::loadExecutable [1], and detect when we duplicate call with the same parameters and return one of the cached objects instead.

A quick hack might be to compare the input pointers, but I suspect we'll need to hash the contents of the input in order to have something actually shippable.

* ANGLE (and all EGL implementations) support a way to request that resources are shared between GLContexts.

We could create a shared global 'resources' context, and teach WR to use that to allocate a single set of shaders, which are then given to each of the windows as they are created.

GL context sharing can sometimes be slow, since it usually also enables locking (each context can be used on a different thread simultaneously). I'm not sure if this will be an issue with ANGLE.

* Use a single GLContext for all the windows. Instead of having a GLContext with the underlying window as the backbuffer, we'd have to create framebuffer objects for each underlying window, and swap them in so that WR renders to that instead.

We do something similar for WebGL, so jgilbert might have some advice for that.


Removing [needs-investigation] now, since I think this regression is well understood.


[1] https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#2647
Flags: needinfo?(dglastonbury)
Whiteboard: [gfx-noted][needs-investigation] → [gfx-noted]
Thanks for writing that up!

Those ideas all sound plausible, and the caveats you have with each also make sense to me.

Sharing a GL context between windows sounds least favorable to me, it seems like that might impact performance in the case of multiple windows more than the other options, although that's just a guess.

If it's of any use, we could certainly provide some kind of unique identifier from WR for each shader that would save needing to hash the entire input strings. Although, I'm not sure how we'd fit this into ANGLE without some kind of hacks / extensions to the existing API, so I'm not sure it'd be worthwhile doing?
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> * Use a single GLContext for all the windows. Instead of having a GLContext
> with the underlying window as the backbuffer, we'd have to create
> framebuffer objects for each underlying window, and swap them in so that WR
> renders to that instead.

We'll want this for Mac, too, bug 1449205.
(In reply to Markus Stange [:mstange] from comment #8)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> > * Use a single GLContext for all the windows. Instead of having a GLContext
> > with the underlying window as the backbuffer, we'd have to create
> > framebuffer objects for each underlying window, and swap them in so that WR
> > renders to that instead.
> 
> We'll want this for Mac, too, bug 1449205.

I assume that context sharing would be an option on Mac too, right? Maybe not one that we want to take (it's easier to trust ANGLE, since we can fix it), but it should be feasible.
That's true, yes.
(In reply to Glenn Watson [:gw] from comment #7)
> Sharing a GL context between windows sounds least favorable to me, it seems
> like that might impact performance in the case of multiple windows more than
> the other options, although that's just a guess.

I'd hope that it wouldn't, doing rendering to an FBO instead of the default backbuffer shouldn't perform differently, and we don't have to risk paying mutex locking costs like we do with context sharing.

That said, GL drivers can be weird, so who knows!

> 
> If it's of any use, we could certainly provide some kind of unique
> identifier from WR for each shader that would save needing to hash the
> entire input strings. Although, I'm not sure how we'd fit this into ANGLE
> without some kind of hacks / extensions to the existing API, so I'm not sure
> it'd be worthwhile doing?

I suspect adding new features to ANGLE isn't going to be worth it, since we'll have the same issue when we try roll WR out to other platforms and we can't add the feature there.

My current feeling is that sharing the GLContext for all windows is going to be the best long term strategy.

We might want to consider add the cache within ANGLE as a short term fix though, since that seems like only a day or two's work, and that's a pretty nice advantage for a regression that's going to block release.
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> New profile with Dan's patch: https://perfht.ml/2Mgxma0
> 
> * Use a single GLContext for all the windows. Instead of having a GLContext
> with the underlying window as the backbuffer, we'd have to create
> framebuffer objects for each underlying window, and swap them in so that WR
> renders to that instead.

RenderCompositorANGLE already split direct connection between HWND and GLContext. GLContext is created by calling GLContextProviderEGL::CreateHeadless(). And attach EGLSurface by calling GLContextEGL::SetEGLSurfaceOverride().

https://dxr.mozilla.org/mozilla-central/source/gfx/webrender_bindings/RenderCompositorANGLE.cpp#438

EGLSurface is created by calling eglCreatePbufferFromClientBuffer()

https://dxr.mozilla.org/mozilla-central/source/gfx/webrender_bindings/RenderCompositorANGLE.cpp#428
Oh awesome, that makes it seem much easier. Thanks Sotaro!
Depends on: 1492723
Hey Matt -- Where does this stand?  How big of a problem do you think this is?
Flags: needinfo?(matt.woodrow)
Dan's patches in bug 1492723 go a long way towards fixing this, and I'm experimenting with adding parallel shader compiling to ANGLE in order to get the remainder.

It's a startup regression, so the sort of thing people are likely to get upset about, but we can probably go to beta with it if we had to.
Flags: needinfo?(matt.woodrow)
Priority: P2 → P1
With all the patches from bug 1492723 and dependents landed, this went from 266 to 130, and now matches non-WR almost exactly.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
\o/
Flags: needinfo?(dglastonbury)
You need to log in before you can comment on or make changes to this bug.