Closed
Bug 938395
Opened 11 years ago
Closed 10 years ago
Optimize our DrawThebesLayer invalid region for Direct2D
Categories
(Core :: Layout, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: mattwoodrow, Assigned: mattwoodrow)
References
Details
(Keywords: perf, Whiteboard: [qa-])
Attachments
(2 files, 3 obsolete files)
32.24 KB,
image/png
|
Details | |
3.78 KB,
patch
|
bas.schouten
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Direct2D is fairly slow at drawing when we have a complex (anything other than a single rect) clip applied. It needs to push a temporary layer, draw to that, and then use the clip geometry as a mask when copying back. Bug 934860 changed our ThebesLayer drawing behaviour to instead draw each rect of the invalid region separately to avoid this slow path. Unfortunately it appears that the cost of drawing display items is somewhat proportional to the number of items we draw, and not just the number of pixels filled (which makes sense). This resulted in a considerable regression for TART on the UX branch. One possible solution is to modify the invalid region into something simpler that will result in optimal performance for direct2d. A simple example of a bad case is: Display item 1: 0,0,100,20 Display item 2: 100,0,10,19 We then invalidate both items and up with a region that contains: Rect 1: 0,0,110,19 Rect 2: 0,19,100,1 We then have to paint item 1 twice, and if the overhead of doing so is considerable, then we lost. Jeff and I discussed 3 possible options for improving this: 1) Come up with a new version of SimplifyOutward for nsRegion that optimizes for some other metric of 'simplicity' rather than number of rects. Ideally this would combine our two rects into a single one, but two small rects at opposite ends of the page would remain separate. I haven't really thought much about how to define this exactly. 2) Rather than trying to reconstruct a smart representation during painting, have information similar to 'this display item is entirely invalidated and should be painted in one go' passed down from DLBI. I really have no idea what this representation would look like. There's also some quite (edge?) cases where this approach would come up with awful results. If we had two display items, one very tall and narrow, the other very wide and short and they overlap each other at the top-left corner. In this case the only way to respect the request that both items get painted at once is to expand the region to the entire bounds, potentially dragging in a bunch of new display items. 3) Avoid clipping entirely whenever possible, and just let Direct2D overdraw. If we cull display items that don't intersect the invalid region, then the extra drawing shouldn't matter much since it's on the GPU. However display items that aren't in the invalid region may occlude parts of display items that are. We'd need to make sure we include these items in the list to be painted. Computing this, along with a potentially large number of extra items being included could make this approach worse in some cases. Does anyone have any thoughts on this?
How about for D2D we just fluff out a complex region to its bounding box if the area of the bounding box is not much more (say 2x?) than the area of the region? If you want to be fancier we could count the number of display items that intersect the bounding box vs just the region and fluff out to the bounding box if there aren't too many of them. You could even use any cost metric for display items that you like.
Assignee | ||
Comment 2•11 years ago
|
||
What do you think about this? Seems to work and not regress performance (this is with single rect painting enabled) - https://tbpl.mozilla.org/?tree=Try&rev=70dd9b3af195 (winxp failures are unrelated, and fixed locally) We could go slightly further with this approach and try weight display items more if we think they have a high overhead. One simple options would be to add 2 for items like nsDisplayBoxShadowOuter, anything that paints via an inactive layer etc. I don't know if we have good enough talos tests to really tune this though.
Attachment #8333718 -
Flags: feedback?(jmuizelaar)
Assignee | ||
Comment 3•11 years ago
|
||
If we're going to simplify the draw region out, then we need to also clear the pixels for the entire draw region. This code takes all the surface clears and moves them into DrawThebesLayer, so that we can do it after we decide what to actually draw. Bas: For this to work, we need a way to initialize an area of a DrawTargetDual with black/white. I've hacked up one possibility in this patch, what do you think? I'm totally open to other suggestions.
Attachment #8333720 -
Flags: feedback?(bas)
Comment 4•11 years ago
|
||
So I had a look at how IE solves this problem. Here are some things that I've discovered: - IE never seems to PushLayer for clipping - IE will double draw items that intersect multiple rects in the invalid region - Multiple rectangles are only used if there is a reasonable large difference from the rectangles and the bounds.
Assignee | ||
Comment 5•11 years ago
|
||
So with these patches, we barely regress TART on win7, but regress win8 TART quite considerably. https://tbpl.mozilla.org/?tree=Try&rev=ab6bdef8e52f Interestingly, the first 8 cycles of TART seem to be getting scores very similar to that of trunk, and then we start regressing considerably. For example: Cycle 1 - newtab-open-preload-yes.half.TART: 2.3 Cycle 11 - newtab-open-preload-yes.half.TART: 38.3 Unfortunately, we were only profiling for a single cycle, so this didn't show anything. I tried upping the cycles to 10, and getting profiles for that. https://tbpl.mozilla.org/?tree=Try&rev=d4fe9977da5f This doesn't seem quite as bad as the earlier runs, but some of the results are dropping off badly. Cycle 1 - icon-open-DPI2.half.TART: 3.3 Cycle 10 - icon-open-DPI2.half.TART: 8.6 Profile for this run with markers left in: http://people.mozilla.org/~bgirard/cleopatra/#report=a259d5c3bf34cb38c6618f5a2e2832346910cd0d The range [12765, 12897] covers this test for cycle 10. (Only cycles 9/10 are in the profile, the rest got dropped out of the circular buffer). We can see in this range that we spend 30% of our profile time in NtFlushBuffersFile, more time than we spent painting. It looks like this is causing some frames to take huge amounts longer than the others. But we also appear to just get slower for almost every frame in later cycles, I haven't seen any explanation for that yet.
Assignee | ||
Comment 6•11 years ago
|
||
Oh, I just realised that this might be because of the profiler running, and completely unrelated to the slowdown.
Comment 7•11 years ago
|
||
When using TART, besides the actual animation, most of the tests also open or close an actual browser tab. This has some overhead which affects the results (typically the first few frames). While the animation is hopefully stabilized at the second half of the animation (the .half values), it's not always the case, and also, there could be some GC/CC residues due to the tab close/open, which may also affect the results. When testing graphics efficiency of tab animations, the best tests are the fade tests. These tests only measures css fade in/out (the actual css animation) of an existing newly opened tab. The fade tests are typically very uniform in interval durations through the entire Animation, including the first frames, and also typically between repetitions of the same animation (the tests which open/close tabs have higher variance than the fade tests). So I'd suggest to stick to the fade tests results for better consistency and representation of gfx perf.
Assignee | ||
Comment 8•11 years ago
|
||
Good run: http://people.mozilla.org/~bgirard/cleopatra/#report=68053466bac32fa50d2fee04fabc8484a580fe5b [9799, 9915] Bad run: http://people.mozilla.org/~bgirard/cleopatra/#report=8238fab39cb01fb8277b1d3bd3afe646b95bf452 [14159, 14278] All the extra time is being spent inside compositing with LayerManagerD3D10, inside CDXGISwapChain::Present.
Assignee | ||
Comment 9•11 years ago
|
||
I tried using IDXGIDevice1::SetMaximumFrameLatency(1) in case we were producing marginally faster than the GPU could composite, and it was taking a while before it caught up. No change in results. I'm kind of out of ideas here. Maybe d3d11+OMTC won't be affected by this.
Assignee | ||
Comment 10•11 years ago
|
||
Same issue with d3d11+OMTC. Trying to get a win8 talos machine loaned to me to investigate this further.
Comment 11•11 years ago
|
||
I was able to get an apitrace of this happening. Replaying it on the talos machines produces present times shown in the attached graph. The graph suggests that the problem is being reproduced in the trace. I should be able to bisect the trace to figure out what's going wrong.
Updated•11 years ago
|
Attachment #8349824 -
Attachment is patch: false
Attachment #8349824 -
Attachment mime type: text/plain → image/png
Comment 12•11 years ago
|
||
I also just noticed that during Present() the nvidia driver is calling Sleep(). This is not a good thing for it to be doing.
Updated•10 years ago
|
Assignee: nobody → jmuizelaar
Comment 13•10 years ago
|
||
Nvidia's most recent driver (334.89) fixes this problem.
Assignee | ||
Comment 14•10 years ago
|
||
Did we have any involvement, or was this fixed purely by luck?
Comment 15•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #14) > Did we have any involvement, or was this fixed purely by luck? I believe we had some involvement. We reached out to Nvidia, they confirmed there was a bug and said that it would be fixed in the driver released on the 18th. It's possible that it was coincidence that it was fixed in their most recent driver release, I don't know either way.
Assignee | ||
Comment 16•10 years ago
|
||
I rebased the patches from here now that the drivers have been updated. Try run with these patches: https://tbpl.mozilla.org/?tree=Try&rev=bee492a8dae3 Compare talos: http://compare-talos.mattn.ca/?oldRevs=95613a41c9de,a2d71850d783,43ba36bc10f2,727238c13756,db8043427ac2&newRev=bee492a8dae3&server=graphs.mozilla.org&submit=true I also tried using SimplifyRegionByArea(100*100) in ThebesLayerD3D10.cpp (from bug 945079) instead of these patches. Try run: https://tbpl.mozilla.org/?tree=Try&rev=911ff5c77e99 Compare Talos: http://compare-talos.mattn.ca/?oldRevs=95613a41c9de,a2d71850d783,43ba36bc10f2,727238c13756,db8043427ac2&newRev=911ff5c77e99&server=graphs.mozilla.org&submit=true The latter avoids a win7 TART regression, and also improves tsvgx considerably. It doesn't improve win8 TART quite as much though. I think we should try get bug 945079 landed and just go with that, since it's simpler and faster most of the time. What do you think Jeff?
Flags: needinfo?(jmuizelaar)
Assignee | ||
Updated•10 years ago
|
Attachment #8333718 -
Attachment is obsolete: true
Attachment #8333718 -
Flags: feedback?(jmuizelaar)
Assignee | ||
Updated•10 years ago
|
Attachment #8333720 -
Attachment is obsolete: true
Attachment #8333720 -
Flags: feedback?(bas)
Assignee | ||
Comment 17•10 years ago
|
||
Attachment #8419041 -
Flags: review?(bas)
Flags: needinfo?(jmuizelaar)
Comment 18•10 years ago
|
||
Comment on attachment 8419041 [details] [diff] [review] Enable single rects Review of attachment 8419041 [details] [diff] [review]: ----------------------------------------------------------------- Very soon we'll be switching to OMTC with D3D11, we should take care to fix it there too. ::: gfx/layers/d3d10/ThebesLayerD3D10.cpp @@ +411,5 @@ > } else { > destinationSurface = mD2DSurface; > } > > + aRegion.SimplifyOutwardByArea(100*100); nit: spaces around operator ::: layout/base/FrameLayerBuilder.cpp @@ +3698,5 @@ > { > static bool sPaintRectsSeparately; > static bool sPaintRectsSeparatelyPrefCached = false; > if (!sPaintRectsSeparatelyPrefCached) { > + mozilla::Preferences::AddBoolVarCache(&sPaintRectsSeparately, "layout.paint_rects_separately", true); Please just add this to the gfxPrefs code.
Attachment #8419041 -
Flags: review?(bas) → review-
Assignee | ||
Comment 19•10 years ago
|
||
Assignee: jmuizelaar → matt.woodrow
Attachment #8419041 -
Attachment is obsolete: true
Attachment #8419065 -
Flags: review?(bas)
Updated•10 years ago
|
Attachment #8419065 -
Flags: review?(bas) → review+
Assignee | ||
Comment 20•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1155f994a3a
Comment 21•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d1155f994a3a
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment 22•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #19) > Created attachment 8419065 [details] [diff] [review] > Enable single rects v2 We should make sure we do the region simplification for OMTC as well.
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #22) > (In reply to Matt Woodrow (:mattwoodrow) from comment #19) > > Created attachment 8419065 [details] [diff] [review] > > Enable single rects v2 > > We should make sure we do the region simplification for OMTC as well. Filed bug 1013759 for this.
Comment 24•10 years ago
|
||
Comment on attachment 8419065 [details] [diff] [review] Enable single rects v2 [Approval Request Comment] Bug caused by (feature/regressing bug #): IE 10 update User impact if declined: Bug 812695 remains Testing completed (on m-c, etc.): Has been on m-c for quite a while Risk to taking this patch (and alternatives if risky): Some risk but it's easy to back out.
Attachment #8419065 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
status-firefox31:
--- → affected
status-firefox32:
--- → fixed
Updated•10 years ago
|
Attachment #8419065 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 26•10 years ago
|
||
Matt, is this something that needs further testing from QA, or does it have automated tests? I'm just looking at this because it came up as fixed and I'm not sure how to test it. Thanks!
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 27•10 years ago
|
||
(In reply to Liz Henry :lizzard from comment #26) > Matt, is this something that needs further testing from QA, or does it have > automated tests? I'm just looking at this because it came up as fixed and > I'm not sure how to test it. Thanks! I don't think it needs further testing. It's a very broad change to how we paint, so bugs in it will show up anywhere/everywhere. I can't think of any specific ways to test the changes beyond all the testing we do already.
Flags: needinfo?(matt.woodrow)
You need to log in
before you can comment on or make changes to this bug.
Description
•