Open Bug 1466835 Opened 7 years ago Updated 2 years ago

Hardware acceleration on Windows slows down masked SVG rendering by factor 2 to 4

Categories

(Core :: Graphics, defect, P2)

61 Branch
x86_64
Windows
defect

Tracking

()

UNCONFIRMED

People

(Reporter: jan.boesenberg, Unassigned)

References

()

Details

(Keywords: perf, platform-parity, testcase, Whiteboard: [gfx-noted])

Attachments

(2 files)

Attached file ha-perf-test.html
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 Steps to reproduce: I created a performance test that adds 500 svgs to the document and constantly changes their color. Each of the SVG elements has the same mask applied. The test displays a frame rate for performance comparison between browser settings. Here is the same testcase on CodePen: https://codepen.io/anon/pen/qKZpGR?editors=1010 Note that if your machine is too fast, you may have to change "elementCount" to a higher values, otherwise fps will always be at 60. Actual results: When disabling hardware acceleration, framerate improves by a factor 2 to 4 (depending on the PC). Here are the results from three different machines: Windows 10 on dated Business Laptop (NVIDIA Quadro-FX 880M) Chrome 67: 32fps Firefox 60.0.1 (HA enabled): 25fps Firefox 60.0.1 (HA disabled): 52fps Windows 10 with NVIDIA GeForce GTX 660 Chrome 67: 30fps Firefox 60.0.1 (HA enabled): 20fps Firefox 60.0.1 (HA disabled): 51fps Windows 7: Chrome 69 (Canary): 44 fps Firefox 62.0a1 (HA enabled): 7.5 fps Firefox 62.0a1 (HA disabled): 30 fps On Mac there is no performance difference between hardware acceleration enabled and disabled. Expected results: Hardware acceleration should at least not have a negative impact on performance. In the test case enabling hardware acceleration slows down performance by a factor 2 to 4. I tried a modified test replacing the mask with a filer and had similar results, so this is not limited to SVG masks.
Performing runtime analysis with hardware acceleration enabled vs disabled shows that much more time for drawing is required. Also, there is a CC graph reduction when hardware acceleration is enabled, but this only takes a small amount of time.
Component: Untriaged → Graphics
Keywords: perf, pp, testcase
OS: Unspecified → Windows
Product: Firefox → Core
Hardware: Unspecified → x86_64
Bas, is this an area where we are being hurt by not using idiomatic Direct2D?
Flags: needinfo?(bas)
Whiteboard: [gfx-noted]
I believe it is, but haven't profiled it yet. This seems similar to bug 1455118, though there we appeared to be particularly slow due to component alpha, and it looks like here we're just slow since there's so many masks pushed. The conclusion from the discussion Bas and I had about the reddit page was that D2D generally does require a pushed layer to mask content, and that pushed layers are slow. To do better, we'd have to detect when the mask can be represented as something simpler than just an image, and when the masked content can draw that directly. For the reddit example, the mask image is an alpha gradient mask, and the masked content should just be text filled with a solid colour. It should be possible to collapse this down to a GradientPattern object that represents both the mask gradient and the text fill colour, and draw the text using that. In this example it appears that the 'masks' are just rectangular clips, so we should be able to draw the inner content with a rectangular clip pushed instead. We discussed implementing this at the display list level. nsDisplayMask would need code to retrieve the mask as a pattern/clip instead of just a surface, with fallback if that's not possible. We'd then have something similar to CanApplyOpacity, where we check if our children display items support being drawn with the given clip/pattern, and if they all can, then we can flatten ourselves away. It's also possible that doing analysis on the OMTP recorded command stream would work for this. I also notice in the profile from bug 1455118 comment 5, that most of the slowness comes from D2DDeviceContextBase::SetTarget. It looks like we push a layer, and create a command stream, and it's switching the current target to that command stream when we trigger a flush. I don't understand why we need the command stream, there's no comments in the original bug, and looking quickly at the D2D docs don't show any need. Maybe Bas remembers why we do this?
@mattwoodrow Thanks for looking into this. The test case is a very simplified version of the original scenario, in which the masks have alpha channels and all kind of weird shapes. I only used solid rectangles for simplification, because the negative performance effect could also be reproduced this way. So in short, using a rectangular clip may solve the issue with this test case, but not with our real world scenario.
Profiled this a bit: (In reply to Matt Woodrow (:mattwoodrow) from comment #3) > I also notice in the profile from bug 1455118 comment 5, that most of the > slowness comes from D2DDeviceContextBase::SetTarget. It looks like we push a > layer, and create a command stream, and it's switching the current target to > that command stream when we trigger a flush. I don't understand why we need > the command stream, there's no comments in the original bug, and looking > quickly at the D2D docs don't show any need. Maybe Bas remembers why we do > this? Commenting out the command stream allocation improves the fps by about 50% for me. I think we need it sometimes, but maybe we can skip it for the simpler cases. EnsureLuminanceEffect takes about 10% of the paint thread. This looks to be allocating a generic effect for the pixel conversion and seems like we shouldn't need to be reallocating it for every rectangle. CreateSimilarDrawTarget for the mask surface takes about 25% of the paint thread. Allocations suck :( We could try do something better here, maybe create an atlas for all the offscreen surface, maybe just pool the allocations. About 10% flushing the draw commands for the mask surface content. As above, since this is just filling a rectangle it's pretty pointless, but this might be necessary for harder use cases. PushLayer takes about 15%, looks like it's flushing previous commands and this is time spent doing the actual mask operation for the previous primitive. 12% on the manual Flush at the end of painting, again, mainly just doing the mask operations. 11% releasing our command stream, freeing the luminance effect mainly I think.
Note on my machine, I get: Chrome: 60fps Edge: 20fps Firefox: 10fps. I'm not sure this particular testcase is something we want to optimize for, or something Edge did optimize for. This particular test case seems easy to optimize for, so it's possible Chrome does. To try and prevent optimizations, I tried to make this test case a little more complicated, see: https://codepen.io/anon/pen/KeMeEp?editors=1010 It sadly seems that text doesn't get masked in SVG (am I doing something wrong here?), so I'm not sure this makes it more complicated. In any case, Chrome's perf drops a bit for me with that. (40fps) Firefox reduces further to about 6fps, Edge seems to stay the same (but has some drawing artifacts! :P) Fwiw, whenever this test case is optimized properly, it just amounts to drawing rapidly, so it would be really fast, the question is whether those cases are worth optimizing for, as there's obviously better ways to draw rectangles.
Flags: needinfo?(bas) → needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #3) > > I also notice in the profile from bug 1455118 comment 5, that most of the > slowness comes from D2DDeviceContextBase::SetTarget. It looks like we push a > layer, and create a command stream, and it's switching the current target to > that command stream when we trigger a flush. I don't understand why we need > the command stream, there's no comments in the original bug, and looking > quickly at the D2D docs don't show any need. Maybe Bas remembers why we do > this? We need it because when we're doing operations that require things like blending we sometimes need to be able to get at the contents of the layer/group, this sadly seems to be the only way, it's possible more recent D2D versions have a way to eliminate this.
> I'm not sure this particular testcase is something we want to optimize for, > or something Edge did optimize for. Thanks again for looking into this. The intention of the test case was not to have a solution for this special scenario of rectangular masks on rectangular shapes. Our real world scenario is much more complex, with masks in all different shapes, alpha channels, gradients, nested masks, filters, etc. The reason why I used rectangles in this test case is that I wanted to make the test case as simple as possible while still being able to demonstrate the performance impact. Here is a test case using random ellipses for shapes and masks: https://codepen.io/anon/pen/dKXjZa?editors=1010
(In reply to Bas Schouten (:bas.schouten) from comment #6) > Fwiw, whenever this test case is optimized properly, it just amounts to > drawing rapidly, so it would be really fast, the question is whether those > cases are worth optimizing for, as there's obviously better ways to draw > rectangles. I am not sure if the main point of this bug report came across properly. On Windows rendering any SVGs that have a mask or filter applied is slowed by a factor 2 to 4 if hardware acceleration is enabled. If hardware acceleration is disabled, Firefox is in the same ballpark as Chrome, but with hardware acceleration enabled, performance drops almost to the level of Internet Explorer 11 8at least on the machine that I am working on). I did not intend to ask for a performance optimization on a specific test case, instead I wanted to report a rather generic performance regression. Sorry if the test case was misleading.
(In reply to Bas Schouten (:bas.schouten) from comment #7) > We need it because when we're doing operations that require things like > blending we sometimes need to be able to get at the contents of the > layer/group, this sadly seems to be the only way, it's possible more recent > D2D versions have a way to eliminate this. Ok good, that's what I saw from looking at the code. With OMTP we have a recording of all the commands within the group, so we have a way to determine if the group will ever need to use the recording. I'm not sure how we'd communicate that to D2D through the Moz2D API in a sane way, but it should be possible if we decide it's important. If we treat this bug as a generic 'masking is slow bug', then I think what we could do is: * Reorder the OMTP commands stream so that all offscreen surface rendering is done first, to reduce the number of render target switches. * Do offscreen surface rendering into large atlas render targets to further reduce render target switches, allocations and releases. * Cache the luminance effect to reduce allocations/releases. * Avoid the command list when we can. I think we'd be pretty quick if we did that, and those are all fairly general purpose optimizations that should help all masking testcases.
Flags: needinfo?(matt.woodrow)
(sounds like moderately important work?)
Priority: -- → P2
I did some further testing with complex SVGs and found that the issue is not limited to masks. Even after removing all masks from the SVGs, they were rendered considerably slower with hardware acceleration enabled.
I created a modification of the test case that uses filters instead of masks. The result is the same, hardware acceleration on Windows slows downs rendering of the SVGs by a factor 2 to 4. This means that the problem is not limited to masks, but also affects SVGs that use filters. Here is the modified test case: https://codepen.io/anon/pen/wYYvpr?editors=1010

I made this file to demonstrate how bottom tier the performance is with this. I even enabled shape-rendering:optimizeSpeed in the hopes it would do anything for performance.

(In reply to redcodefinal@gmail.com from comment #14)

Created attachment 9138334 [details]
An example of how bad this slowdown is

I made this file to demonstrate how bottom tier the performance is with this. I even enabled shape-rendering:optimizeSpeed in the hopes it would do anything for performance.

Ok, interesting update the slowdown is much more pronounced at large size. For an example, click the attached image directly, then open via details and you can see the slow down is considerable when larger.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: