Improve performance of mix-blend
Categories
(Core :: Graphics: WebRender, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox88 | --- | fixed |
People
(Reporter: gw, Assigned: gw)
References
(Depends on 1 open bug)
Details
Attachments
(3 files)
Assignee | ||
Updated•3 months ago
|
Assignee | ||
Updated•3 months ago
|
Assignee | ||
Comment 1•3 months ago
|
||
Restructure how frame building handles building of the surface
render task structures.
In future parts of this bug, a surface may have multiple "sub-passes"
where the rendering is separated into different render tasks that
write to the same target (without clear on subsequent sub-passes).
For example, this will be used to allow mix-blend-mode and
backdrop-filter to issue readbacks directly from picture cache
tiles rather than forcing draws to an intermediate surface.
The add_child_render_task
method will be expanded to support
the case of a surface having multiple sub-passes, adding the
render task to the correct sub-pass.
Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f7b7b8b13507 Pt 1 - Refactor SurfaceRenderTasks r=nical
Comment 3•3 months ago
|
||
bugherder |
Assignee | ||
Comment 4•3 months ago
|
||
Instead, top level tile cache pictures are stored in the scene.
Follow up tasks in this bug will be simplified by having pictures
only exist when they have Some(..) for requested_composite_mode.
This patch removes one case of a pass-through picture, and
simplfies some of the surrounding code in the process.
Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d123c4ed95cb Pt 2 - Remove root_pic_index. r=nical
Comment 6•3 months ago
|
||
bugherder |
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Comment 7•2 months ago
|
||
This patch enables the faster mix-blend-mode path that allows using
picture cache tiles as the backdrop source for blends where that
is appropriate (most of the underlying work is in previous patches
or the dependencies of this bug).
In addition to avoiding an extra intermediate surface for blends
that are on a picture cache surface, it also avoids constant
invalidation of picture cache tiles due to the blend container
not being part of the main content scroll root.
As an example of the typical performance improvement, the GPU times
on an AMD 5700 GPU at 4k, when using the Firelux color temperature addon
browsing pages drops from ~1.8ms to ~0.3 ms.
Assignee | ||
Comment 8•2 months ago
|
||
I ran into an issue with this patch on Windows + DirectComposition + ANGLE. Write up below.
Background
The mix-blend-mode optimizations I've been working on rely on the ability to do blits from picture cache tiles to render target textures.
Problem
This seems to be a bit tricky on Windows + DirectComposition + ANGLE. With my current patches, we get an assert at [1]. This is because the SRV has not been initialized for the renderbuffer. I believe this is because the D3D texture does not have the D3D11_BIND_SHADER_RESOURCE
binding flag set (it does have the DXGI_USAGE_SHADER_INPUT
flag set on the resource though).
We can't control the creation or flags of the D3D texture, the pointer to the texture is returned to us by the BeginDraw
API in DirectComposition that we use to start drawing on a picture cache tile.
Options
There are a few hacks / workarounds I tried, that prove it can be made to work. It's just a matter of working out which is the best option (or if there's a better option I haven't found yet).
a) Further down the method in [1], there is a fast path that uses CopySubresourceRegion
[2]. The only thing that is stopping us hitting that fast path is that the target texture is DXGI_FORMAT_R8G8B8A8_UNORM, while the virtual surface backing texture is created as DXGI_FORMAT_B8G8R8A8_UNORM [3]. If I change the virtual surface to be DXGI_FORMAT_R8G8B8A8_UNORM, and remove the assertion checks in (1), we hit the fast path in that method and it all seems to work. However, I'm not sure if there's any issues with changing the virtual surface format (e.g. with video surfaces?), and we'd need to patch ANGLE so that those asserts don't fire if the fast path is hit (since the readSRV
is not used in those cases). It's also a little fragile that it breaks if we don't hit the fast path, so I'm not sure if that's upstream-able?
b) I tried to change from creating a renderbuffer from the EGLImage to creating a texture that could be attached to the FBO at [4]. This fails validation for the same reason above, that it's not possible to create an SRV
for the texture we get back from DirectComposition. Maybe there is another way to create an FBO via ANGLE which is bound to the D3D texture that we are able to read back from?
c) glCopyImageSubData
is a GL4 (and ES 3.2) function that looks like it would do what we want (direct copy from a renderbuffer to a texture). It seems to be at least partially implemented in ANGLE [5] though I haven't tried it out yet (and I'm not sure if it's guaranteed always available when running ANGLE). Not sure if the specific renderbuffer -> texture impl is there either.
Questions
- Of the above, do any of those seem like a reasonable potential solution?
- Are there other options we could try to make this work in ANGLE (and hit a fast path)?
[1] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#3500
[2] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#3624
[3] https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp#595
[4] https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp#1004
[5] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libGLESv2/entry_points_gles_3_2_autogen.cpp#173
Assignee | ||
Updated•2 months ago
|
Comment 9•2 months ago
•
|
||
BlitFramebuffer rb->rb is absolutely supposed to work, but likely all (single-sampled) CreateRenderbuffer RBs in ANGLE have D3D11_BIND_SHADER_RESOURCE. I'm not sure why the resource has DXGI_USAGE_SHADER_INPUT but the D3D11Texture doesn't have D3D11_BIND_SHADER_RESOURCE.
There's a risk that going BGRA->RGBA is a less efficient path. There's no real reason this can't be just as efficient, but it happens sometimes in drivers.
I'm surprised that we're blitting from a DComp texture! Why is the picture cache tile a dcomp surface? (Maybe I'm misunderstanding what "picture cache tile" is?) Is this a blit-from-dcomp;edit;blit-back-to-dcomp?
We could upstream something to handle the true-renderbuffer-with-no-D3D11_BIND_SHADER_RESOURCE path.
It sounds like we need to have the formats actually match in D3D, so I doubt CopyImageSubData does any better here. Apparently this is something where GL is more permissive via BlitFramebuffer than D3D is via CopySubresourceRegion, unless I'm reading the docs here wrong for D3D.
Assignee | ||
Comment 10•2 months ago
|
||
Thanks! Do you have suggestions on the correct D3D APIs to use to implement the "We could upstream something to handle the true-renderbuffer-with-no-D3D11_BIND_SHADER_RESOURCE path." bit?
Updated•2 months ago
|
Comment 11•2 months ago
|
||
We'd need to create an intermediary surface that's src_format but with D3D11_BIND_SHADER_RESOURCE, copy to that via CopySubResourceRegion, and sample from that when ANGLE does its draw-blit slow-path.
Comment 12•2 months ago
|
||
Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/23bd3cec7843 Improve performance of mix-blend-mode. r=nical
Comment 13•2 months ago
|
||
bugherder |
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Comment 14•2 months ago
|
||
Comment on attachment 9201950 [details]
Bug 1684781 - Improve performance of mix-blend-mode.
Beta/Release Uplift Approval Request
- User impact if declined: This fixes a visual bug when scrolling pages with mix-blend content, and also offers a large performance win on any content that makes heavy use of mix-blend-mode (e.g. color temperature extensions).
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: Bug 1694982
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): The change does enable new code paths for each platform, however the change itself is small so can be easily reverted if needed. It's also been in nightly for nearly a week now - one regression was found which is fixed in 1694982 (also needs uplift) - which has been in nightly a few days.
- String changes made/needed:
Assignee | ||
Comment 15•2 months ago
|
||
Let's hold off on that beta uplift request while I investigate the regression above.
Assignee | ||
Comment 16•2 months ago
|
||
This is a fairly complex issue since at least two recent patches seem to interact in some ways to make this occur on certain conditions / hardware. I tried to explain my findings at https://bugzilla.mozilla.org/show_bug.cgi?id=1695807#c2.
Assignee | ||
Updated•2 months ago
|
Updated•2 months ago
|
Comment 17•1 month ago
|
||
I guess we're letting this ride the trains after all?
Assignee | ||
Comment 18•1 month ago
|
||
Yup, we found a simple fix for the advanced mix blend regression, which we've uplifted, so we'll let this performance improvement ride the trains.
Description
•