Closed Bug 1684781 Opened 4 months ago Closed 2 months ago

Improve performance of mix-blend

Categories

(Core :: Graphics: WebRender, task)

task

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox88 --- fixed

People

(Reporter: gw, Assigned: gw)

References

(Depends on 1 open bug)

Details

Attachments

(3 files)

No description provided.
Depends on: 1682365
Blocks: 1684354
Severity: -- → S3
Depends on: 1683440
Assignee: nobody → gwatson
Keywords: leave-open

Restructure how frame building handles building of the surface
render task structures.

In future parts of this bug, a surface may have multiple "sub-passes"
where the rendering is separated into different render tasks that
write to the same target (without clear on subsequent sub-passes).

For example, this will be used to allow mix-blend-mode and
backdrop-filter to issue readbacks directly from picture cache
tiles rather than forcing draws to an intermediate surface.

The add_child_render_task method will be expanded to support
the case of a surface having multiple sub-passes, adding the
render task to the correct sub-pass.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f7b7b8b13507
Pt 1 - Refactor SurfaceRenderTasks r=nical

Instead, top level tile cache pictures are stored in the scene.

Follow up tasks in this bug will be simplified by having pictures
only exist when they have Some(..) for requested_composite_mode.

This patch removes one case of a pass-through picture, and
simplfies some of the surrounding code in the process.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d123c4ed95cb
Pt 2 - Remove root_pic_index. r=nical
Depends on: 1687394
Depends on: 1687409
Depends on: 1687573
Depends on: 1687604
Depends on: 1687863
Depends on: 1690396
Summary: Improve performance of mix-blend and backdrop-filter → Improve performance of mix-blend

This patch enables the faster mix-blend-mode path that allows using
picture cache tiles as the backdrop source for blends where that
is appropriate (most of the underlying work is in previous patches
or the dependencies of this bug).

In addition to avoiding an extra intermediate surface for blends
that are on a picture cache surface, it also avoids constant
invalidation of picture cache tiles due to the blend container
not being part of the main content scroll root.

As an example of the typical performance improvement, the GPU times
on an AMD 5700 GPU at 4k, when using the Firelux color temperature addon
browsing pages drops from ~1.8ms to ~0.3 ms.

I ran into an issue with this patch on Windows + DirectComposition + ANGLE. Write up below.

Background

The mix-blend-mode optimizations I've been working on rely on the ability to do blits from picture cache tiles to render target textures.

Problem

This seems to be a bit tricky on Windows + DirectComposition + ANGLE. With my current patches, we get an assert at [1]. This is because the SRV has not been initialized for the renderbuffer. I believe this is because the D3D texture does not have the D3D11_BIND_SHADER_RESOURCE binding flag set (it does have the DXGI_USAGE_SHADER_INPUT flag set on the resource though).

We can't control the creation or flags of the D3D texture, the pointer to the texture is returned to us by the BeginDraw API in DirectComposition that we use to start drawing on a picture cache tile.

Options

There are a few hacks / workarounds I tried, that prove it can be made to work. It's just a matter of working out which is the best option (or if there's a better option I haven't found yet).

a) Further down the method in [1], there is a fast path that uses CopySubresourceRegion [2]. The only thing that is stopping us hitting that fast path is that the target texture is DXGI_FORMAT_R8G8B8A8_UNORM, while the virtual surface backing texture is created as DXGI_FORMAT_B8G8R8A8_UNORM [3]. If I change the virtual surface to be DXGI_FORMAT_R8G8B8A8_UNORM, and remove the assertion checks in (1), we hit the fast path in that method and it all seems to work. However, I'm not sure if there's any issues with changing the virtual surface format (e.g. with video surfaces?), and we'd need to patch ANGLE so that those asserts don't fire if the fast path is hit (since the readSRV is not used in those cases). It's also a little fragile that it breaks if we don't hit the fast path, so I'm not sure if that's upstream-able?

b) I tried to change from creating a renderbuffer from the EGLImage to creating a texture that could be attached to the FBO at [4]. This fails validation for the same reason above, that it's not possible to create an SRV for the texture we get back from DirectComposition. Maybe there is another way to create an FBO via ANGLE which is bound to the D3D texture that we are able to read back from?

c) glCopyImageSubData is a GL4 (and ES 3.2) function that looks like it would do what we want (direct copy from a renderbuffer to a texture). It seems to be at least partially implemented in ANGLE [5] though I haven't tried it out yet (and I'm not sure if it's guaranteed always available when running ANGLE). Not sure if the specific renderbuffer -> texture impl is there either.

Questions
- Of the above, do any of those seem like a reasonable potential solution?
- Are there other options we could try to make this work in ANGLE (and hit a fast path)?

[1] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#3500
[2] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#3624
[3] https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp#595
[4] https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/DCLayerTree.cpp#1004
[5] https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/libGLESv2/entry_points_gles_3_2_autogen.cpp#173

Flags: needinfo?(jgilbert)

BlitFramebuffer rb->rb is absolutely supposed to work, but likely all (single-sampled) CreateRenderbuffer RBs in ANGLE have D3D11_BIND_SHADER_RESOURCE. I'm not sure why the resource has DXGI_USAGE_SHADER_INPUT but the D3D11Texture doesn't have D3D11_BIND_SHADER_RESOURCE.

There's a risk that going BGRA->RGBA is a less efficient path. There's no real reason this can't be just as efficient, but it happens sometimes in drivers.

I'm surprised that we're blitting from a DComp texture! Why is the picture cache tile a dcomp surface? (Maybe I'm misunderstanding what "picture cache tile" is?) Is this a blit-from-dcomp;edit;blit-back-to-dcomp?

We could upstream something to handle the true-renderbuffer-with-no-D3D11_BIND_SHADER_RESOURCE path.

It sounds like we need to have the formats actually match in D3D, so I doubt CopyImageSubData does any better here. Apparently this is something where GL is more permissive via BlitFramebuffer than D3D is via CopySubresourceRegion, unless I'm reading the docs here wrong for D3D.

Flags: needinfo?(jgilbert)

Thanks! Do you have suggestions on the correct D3D APIs to use to implement the "We could upstream something to handle the true-renderbuffer-with-no-D3D11_BIND_SHADER_RESOURCE path." bit?

Flags: needinfo?(jgilbert)
Depends on: 1691859, 1692031

We'd need to create an intermediary surface that's src_format but with D3D11_BIND_SHADER_RESOURCE, copy to that via CopySubResourceRegion, and sample from that when ANGLE does its draw-blit slow-path.

Flags: needinfo?(jgilbert)
Depends on: 1692095
Depends on: 1692355
Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/23bd3cec7843
Improve performance of mix-blend-mode. r=nical
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
No longer blocks: 1684354
Regressions: 1694982

Comment on attachment 9201950 [details]
Bug 1684781 - Improve performance of mix-blend-mode.

Beta/Release Uplift Approval Request

  • User impact if declined: This fixes a visual bug when scrolling pages with mix-blend content, and also offers a large performance win on any content that makes heavy use of mix-blend-mode (e.g. color temperature extensions).
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1694982
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The change does enable new code paths for each platform, however the change itself is small so can be easily reverted if needed. It's also been in nightly for nearly a week now - one regression was found which is fixed in 1694982 (also needs uplift) - which has been in nightly a few days.
  • String changes made/needed:
Attachment #9201950 - Flags: approval-mozilla-beta?
Regressions: 1695807

Let's hold off on that beta uplift request while I investigate the regression above.

This is a fairly complex issue since at least two recent patches seem to interact in some ways to make this occur on certain conditions / hardware. I tried to explain my findings at https://bugzilla.mozilla.org/show_bug.cgi?id=1695807#c2.

Attachment #9201950 - Flags: approval-mozilla-beta?

I guess we're letting this ride the trains after all?

Target Milestone: --- → 88 Branch

Yup, we found a simple fix for the advanced mix blend regression, which we've uplifted, so we'll let this performance improvement ride the trains.

Regressions: 1701361
Regressions: 1704478
You need to log in before you can comment on or make changes to this bug.