Closed Bug 938395 Opened 12 years ago Closed 11 years ago

Optimize our DrawThebesLayer invalid region for Direct2D

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla32

Tracking Flags:

Tracking

Status

firefox31

---

fixed

firefox32

---

fixed

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

Details

(Keywords: perf, Whiteboard: [qa-])

Attachments

(2 files, 3 obsolete files)

Simplify the invalid region if we think it will be faster 12 years ago Matt Woodrow (:mattwoodrow) 7.83 KB, patch		Details \| Diff \| Splinter Review
Move clearing of surfaces into DrawThebesLayer 12 years ago Matt Woodrow (:mattwoodrow) 31.37 KB, patch		Details \| Diff \| Splinter Review
Present times on talos machine 12 years ago Jeff Muizelaar [:jrmuizel] 32.24 KB, image/png		Details
Enable single rects 11 years ago Matt Woodrow (:mattwoodrow) 2.02 KB, patch	bas.schouten : review-	Details \| Diff \| Splinter Review
Enable single rects v2 11 years ago Matt Woodrow (:mattwoodrow) 3.78 KB, patch	bas.schouten : review+ Sylvestre : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Matt Woodrow (:mattwoodrow)

Assignee

Description

•

12 years ago

Direct2D is fairly slow at drawing when we have a complex (anything other than a single rect) clip applied. It needs to push a temporary layer, draw to that, and then use the clip geometry as a mask when copying back. Bug 934860 changed our ThebesLayer drawing behaviour to instead draw each rect of the invalid region separately to avoid this slow path. Unfortunately it appears that the cost of drawing display items is somewhat proportional to the number of items we draw, and not just the number of pixels filled (which makes sense). This resulted in a considerable regression for TART on the UX branch. One possible solution is to modify the invalid region into something simpler that will result in optimal performance for direct2d. A simple example of a bad case is: Display item 1: 0,0,100,20 Display item 2: 100,0,10,19 We then invalidate both items and up with a region that contains: Rect 1: 0,0,110,19 Rect 2: 0,19,100,1 We then have to paint item 1 twice, and if the overhead of doing so is considerable, then we lost. Jeff and I discussed 3 possible options for improving this: 1) Come up with a new version of SimplifyOutward for nsRegion that optimizes for some other metric of 'simplicity' rather than number of rects. Ideally this would combine our two rects into a single one, but two small rects at opposite ends of the page would remain separate. I haven't really thought much about how to define this exactly. 2) Rather than trying to reconstruct a smart representation during painting, have information similar to 'this display item is entirely invalidated and should be painted in one go' passed down from DLBI. I really have no idea what this representation would look like. There's also some quite (edge?) cases where this approach would come up with awful results. If we had two display items, one very tall and narrow, the other very wide and short and they overlap each other at the top-left corner. In this case the only way to respect the request that both items get painted at once is to expand the region to the entire bounds, potentially dragging in a bunch of new display items. 3) Avoid clipping entirely whenever possible, and just let Direct2D overdraw. If we cull display items that don't intersect the invalid region, then the extra drawing shouldn't matter much since it's on the GPU. However display items that aren't in the invalid region may occlude parts of display items that are. We'd need to make sure we include these items in the list to be painted. Computing this, along with a potentially large number of extra items being included could make this approach worse in some cases. Does anyone have any thoughts on this?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 1

•

12 years ago

How about for D2D we just fluff out a complex region to its bounding box if the area of the bounding box is not much more (say 2x?) than the area of the region? If you want to be fancier we could count the number of display items that intersect the bounding box vs just the region and fluff out to the bounding box if there aren't too many of them. You could even use any cost metric for display items that you like.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 2

•

12 years ago

Attached patch Simplify the invalid region if we think it will be faster (obsolete) — Details — Splinter Review

What do you think about this? Seems to work and not regress performance (this is with single rect painting enabled) - https://tbpl.mozilla.org/?tree=Try&rev=70dd9b3af195 (winxp failures are unrelated, and fixed locally) We could go slightly further with this approach and try weight display items more if we think they have a high overhead. One simple options would be to add 2 for items like nsDisplayBoxShadowOuter, anything that paints via an inactive layer etc. I don't know if we have good enough talos tests to really tune this though.

Attachment #8333718 - Flags: feedback?(jmuizelaar)

Matt Woodrow (:mattwoodrow)

Assignee

Comment 3

•

12 years ago

Attached patch Move clearing of surfaces into DrawThebesLayer (obsolete) — Details — Splinter Review

If we're going to simplify the draw region out, then we need to also clear the pixels for the entire draw region. This code takes all the surface clears and moves them into DrawThebesLayer, so that we can do it after we decide what to actually draw. Bas: For this to work, we need a way to initialize an area of a DrawTargetDual with black/white. I've hacked up one possibility in this patch, what do you think? I'm totally open to other suggestions.

Attachment #8333720 - Flags: feedback?(bas)

Jeff Muizelaar [:jrmuizel]

Comment 4

•

12 years ago

So I had a look at how IE solves this problem. Here are some things that I've discovered: - IE never seems to PushLayer for clipping - IE will double draw items that intersect multiple rects in the invalid region - Multiple rectangles are only used if there is a reasonable large difference from the rectangles and the bounds.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 5

•

12 years ago

So with these patches, we barely regress TART on win7, but regress win8 TART quite considerably. https://tbpl.mozilla.org/?tree=Try&rev=ab6bdef8e52f Interestingly, the first 8 cycles of TART seem to be getting scores very similar to that of trunk, and then we start regressing considerably. For example: Cycle 1 - newtab-open-preload-yes.half.TART: 2.3 Cycle 11 - newtab-open-preload-yes.half.TART: 38.3 Unfortunately, we were only profiling for a single cycle, so this didn't show anything. I tried upping the cycles to 10, and getting profiles for that. https://tbpl.mozilla.org/?tree=Try&rev=d4fe9977da5f This doesn't seem quite as bad as the earlier runs, but some of the results are dropping off badly. Cycle 1 - icon-open-DPI2.half.TART: 3.3 Cycle 10 - icon-open-DPI2.half.TART: 8.6 Profile for this run with markers left in: http://people.mozilla.org/~bgirard/cleopatra/#report=a259d5c3bf34cb38c6618f5a2e2832346910cd0d The range [12765, 12897] covers this test for cycle 10. (Only cycles 9/10 are in the profile, the rest got dropped out of the circular buffer). We can see in this range that we spend 30% of our profile time in NtFlushBuffersFile, more time than we spent painting. It looks like this is causing some frames to take huge amounts longer than the others. But we also appear to just get slower for almost every frame in later cycles, I haven't seen any explanation for that yet.

Matt Woodrow (:mattwoodrow)

Assignee

•

12 years ago

I tried using IDXGIDevice1::SetMaximumFrameLatency(1) in case we were producing marginally faster than the GPU could composite, and it was taking a while before it caught up. No change in results. I'm kind of out of ideas here. Maybe d3d11+OMTC won't be affected by this.

Mats Palmgren (inactive)

Updated

•

12 years ago

Keywords: perf

Priority: -- → P3

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

12 years ago

Depends on: 943048

Matt Woodrow (:mattwoodrow)

Assignee

Comment 10

•

12 years ago

Same issue with d3d11+OMTC. Trying to get a win8 talos machine loaned to me to investigate this further.

Avi Halachmi (:avih)

•

12 years ago

I also just noticed that during Present() the nvidia driver is calling Sleep(). This is not a good thing for it to be doing.

Jeff Muizelaar [:jrmuizel]

Updated

•

11 years ago

Attached patch Enable single rects v2 — Details — Splinter Review

Assignee: jmuizelaar → matt.woodrow

Attachment #8419041 - Attachment is obsolete: true

Attachment #8419065 - Flags: review?(bas)

Bas Schouten (:bas.schouten)

Updated

•

11 years ago

Attachment #8419065 - Flags: review?(bas) → review+

Matt Woodrow (:mattwoodrow)

Assignee

Comment 20

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1155f994a3a

Carsten Book [:Tomcat]

Comment 21

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/d1155f994a3a

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Jeff Muizelaar [:jrmuizel]

Comment 22

•

11 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #19) > Created attachment 8419065 [details] [diff] [review] > Enable single rects v2 We should make sure we do the region simplification for OMTC as well.

(mostly gone) XtC4UaLL [:xtc4uall]

Updated

•

11 years ago

Depends on: 1008573

Alice0775 White

•

11 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #22) > (In reply to Matt Woodrow (:mattwoodrow) from comment #19) > > Created attachment 8419065 [details] [diff] [review] > > Enable single rects v2 > > We should make sure we do the region simplification for OMTC as well. Filed bug 1013759 for this.

Jeff Muizelaar [:jrmuizel]

Comment 24

•

11 years ago

Comment on attachment 8419065 [details] [diff] [review] Enable single rects v2 [Approval Request Comment] Bug caused by (feature/regressing bug #): IE 10 update User impact if declined: Bug 812695 remains Testing completed (on m-c, etc.): Has been on m-c for quite a while Risk to taking this patch (and alternatives if risky): Some risk but it's easy to back out.

Attachment #8419065 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

status-firefox31: --- → affected

status-firefox32: --- → fixed

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

Attachment #8419065 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 25

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/943469b0defe

status-firefox31: affected → fixed

Liz Henry (:lizzard) (relman/hg->git project)

Comment 26

•

11 years ago

Matt, is this something that needs further testing from QA, or does it have automated tests? I'm just looking at this because it came up as fixed and I'm not sure how to test it. Thanks!

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Assignee

Comment 27

•

11 years ago

(In reply to Liz Henry :lizzard from comment #26) > Matt, is this something that needs further testing from QA, or does it have > automated tests? I'm just looking at this because it came up as fixed and > I'm not sure how to test it. Thanks! I don't think it needs further testing. It's a very broad change to how we paint, so bugs in it will show up anywhere/everywhere. I can't think of any specific ways to test the changes beyond all the testing we do already.

Flags: needinfo?(matt.woodrow)

u279076

Updated

•

11 years ago

Whiteboard: [qa-]

You need to log in before you can comment on or make changes to this bug.