Significant Zimbra perf regression due to d2d

NEW
Assigned to

Status

()

defect
9 years ago
7 years ago

People

(Reporter: jst, Assigned: jst)

Tracking

(Blocks 1 bug)

Trunk
x86
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 .x+)

Details

(Whiteboard: [softblocker][final?][approved-patches-landed], )

Attachments

(2 attachments)

We've taken a major hit (2x) on a decent number of the tests in the zimbra perf test suite (non-public, available internally at Mozilla), and if I disable d2d (by setting "gfx.direct2d.disabled" to true) the performance goes back to what I would expect it to be. Joe tells me that this is likely due to how d2d interoperates d3d, with Bas is working on, so this bug is primarily to track that Bas' work really does fix the regression as seen here as well.

For anyone looking at the graphs in the url field, ignore the first test, it's very unpredictable, but the rest are very repeatable. If anyone wants to run these tests themselves, let me know and I can tell you where they can be loaded.
Blocking betaN for now, but blocking anything else is fine with me, as long as it blocks 2.0 :)
blocking2.0: --- → betaN+
I'm not so sure if this is caused by interop (it's possible, I just see no evidence of that). It all depends on how Zimbra does it's drawing, it's certainly trivial for me to come up with some code that draws 5 times slower on D2D than it does on GDI.

If Zimbra is triggering a lot of those scenarios, that would certainly be a problem. What for example could currently be very slow is any code that uses 'special' CSS borders. It's possible that that has to do with it, I'm not sure how much border drawing paths get drawn, we should just profile these tests.
Border drawing is certainly one of the things I remember seeing in the painting parts of the Zimbra profiles.

What border styles are problematic for d2d?  I can try to write a patch to disable those somehow so Johnny can test this theory.
(In reply to comment #3)
> Border drawing is certainly one of the things I remember seeing in the painting
> parts of the Zimbra profiles.
> 
> What border styles are problematic for d2d?  I can try to write a patch to
> disable those somehow so Johnny can test this theory.

Anything that uses clip or OPERATOR_ADD (which is a -lot- of the border code), we should try just commenting all those clips and OPERATOR_ADDs out. And then see how performance improves, and use that as a guideline. This might actually be the perfect test case giving us some factual evidence our border drawing code is harmful for Direct2D.

We should simply stop jumping through all kinds of nasty hoops to do anti-aliasing on border corners for example, at least when using D2D. Jeff tells me no other browser actually does this.

The clips should be avoidable mostly just by doing the drawing differently, that should greatly improve the performance for D2D but it -might- regress the performance for GDI so we need to look at that.

There's bug 588271 to track this issue.
Depends on: 588271
I can comment out some clips and test.  What does it mean to comment out OPERATOR_ADD?  Just replace with OVER or SOURCE?
(In reply to comment #5)
> I can comment out some clips and test.  What does it mean to comment out
> OPERATOR_ADD?  Just replace with OVER or SOURCE?

Over would do fine!
Posted patch Quick hackSplinter Review
I _think_ this kills the relevant OPERATOR_ADDs and clips.  jst, do you want to give this a spin?
I think the Clip() removals there potentially hurt.  FWIW, I'm perfectly OK with changing ADD to OVER, and living with the artifacts at intersections.  ADD/OVER has nothing to do with rounded borders; the reason why ADD is used is so that the intersection between two border edges that have different colors is handled correctly.  The artifacts can be seen here: https://bug368247.bugzilla.mozilla.org/attachment.cgi?id=254651 .  Compare our rendering and Chrome's (ignore that ours have border radius and theirs don't, the testcase just uses -moz-border-radius).  Specifically, the third testcase -- note the dark colors at the edges.
Yeah, I was about to say that.  What happens if you only make the s/ADD/OVER/ change and keep the Clip() calls?
We could certainly try that, I'm surprised removing the clips wouldn't help though, they helped a lot in my local test. But let's with all the clips in and just ADD replaced by OVER.
With the patch being only the change from OPERATOR_ADD to OPERATOR_OVER we do get a speedup on the zimbra tests, but only a very small one. Results in the graph linked to in this bug...
Johnny, did you ever end up doing a test with D2D only and software only?
We should also retest with the patches in bug 599118 (possibly just once it lands?)
Depends on: 599118
(In reply to comment #14)
> We should also retest with the patches in bug 599118 (possibly just once it
> lands?)

It landed :).
Is the standard deviation on some tests rather large? ADD will -never- be faster than OVER, yet on the go to 'Tasks' tests has the OVER fix show slower than the original ADD measurement.
> Is the standard deviation on some tests rather large? 

Depends on how you measure "large".  Not compared to the size of the regression we're seeing.  It does look like the difference between OVER and ADD for the "go to Tasks" test is just noise.
Test results now up that show the performance right before Bas' fix for bug 599118. With that fix we're doing better, but still far worse than with d2d disabled. Those numbers are w/o changing ADD to OVER. Check the url in this bug, look at the sheet named "2010-09-24 bas' d2d fix".
(In reply to comment #18)
> Test results now up that show the performance right before Bas' fix for bug
> 599118.

... and right after, and then also with d2d disabled in that latter build.
(In reply to comment #19)
> (In reply to comment #18)
> > Test results now up that show the performance right before Bas' fix for bug
> > 599118.
> 
> ... and right after, and then also with d2d disabled in that latter build.

I might be looking at the wrong place, but I can't see the results anywhere? :)
You need to switch to the second tab from the left on the bottom of the spreadsheet.  The one labeled "2010-09-24 bas' d2d fix".
(In reply to comment #21)
> You need to switch to the second tab from the left on the bottom of the
> spreadsheet.  The one labeled "2010-09-24 bas' d2d fix".

Wow, this is fascinating, these results suggests my fix made some test cases worse.. that's really.. well, unexpected ... impossible almost. Although I guess the noise could be a bit higher than expected. Well we should try this with D3D10 layers, (I don't expect that to fix it though!) and then profile it to see what's taking the time.
We really do need this run with d2d only (accelerated layers off) and software only (hw accel disabled).

layers.accelerate-none;true
Options dialog > Advanced > General > Use hardware acceleration where available.
Any progress on getting us numbers as I requested in comment 23? Also, testing with the d3d10 backend would be instructive.
Assignee: nobody → jst
Under D2D, we seem to be spending roughly 75% of our painting time drawing borders. Painting accounts for roughly 30% of our processing time.

30% of our processing time seems to be spent doing reflows.
Note that the testcase is measuring wall-clock time, not processing time, and it does so across running the event loop.  So anything that causes us to block will also show up in the times it measures.

30% CPU time under reflow sounds about right to me based on profiling this on Mac, for what it's worth...
This patch takes time taken by border drawing rendering to practically 0. It seems they decided to, in -addition- to specifying transparent border colors, specify moz-border-*-colors as transparent.

Our comment here stated we don't check for that, because it's kind of silly, but I guess it's better if we do, it gives Zimbra a huge painting perf boost. This patch adds that check and updates the comment.
Attachment #486506 - Flags: review?(vladimir)
Attachment #486506 - Flags: approval2.0?
Comment on attachment 486506 [details] [diff] [review]
Avoid more transparent border drawing

Nice catch!
Attachment #486506 - Flags: review?(vladimir)
Attachment #486506 - Flags: review+
Attachment #486506 - Flags: approval2.0?
Attachment #486506 - Flags: approval2.0+
With the landing of http://hg.mozilla.org/mozilla-central/rev/83e5fdf07535 this should have improved a lot! It would be great if we can retest with that patch in.
(In reply to comment #27)
> This patch takes time taken by border drawing rendering to practically 0. It
> seems they decided to, in -addition- to specifying transparent border colors,
> specify moz-border-*-colors as transparent.

Probably because of bug 482692?
Ok, good news, the patch that Bas landed (comment 29) made a huge difference here. I updated the google spreadsheet linked to in this bug with charts showing how we perform with no changes, with hw accel disabled (in the prefs UI), with d2d disabled, and with layer acceleration disabled. I ran these tests with a build built from 20101105, which means the fix wasn't included, once I realized that I re-ran with layer acceleration disabled in a build and included those results in the spreadsheet too and the graph speaks for itself. The last set of data in that graph (gray) shows that the last 10 tests now perform as well as they used to, but there's possibly something still going on with the "go to'*'" tests, I'll need to repeat this with the new build with a clean profile to know...

The results are in the sheet named "hw accel".
(In reply to comment #31)
> Ok, good news, the patch that Bas landed (comment 29) made a huge difference
> here. I updated the google spreadsheet linked to in this bug with charts
> showing how we perform with no changes, with hw accel disabled (in the prefs
> UI), with d2d disabled, and with layer acceleration disabled. I ran these tests
> with a build built from 20101105, which means the fix wasn't included, once I
> realized that I re-ran with layer acceleration disabled in a build and included
> those results in the spreadsheet too and the graph speaks for itself. The last
> set of data in that graph (gray) shows that the last 10 tests now perform as
> well as they used to, but there's possibly something still going on with the
> "go to'*'" tests, I'll need to repeat this with the new build with a clean
> profile to know...
> 
> The results are in the sheet named "hw accel".

Yeah, the 'go to' results are interesting! I'd much like to see the results with layer accel on as well (like you say, clean profile will do it!). It's good to see we're making considerable progress it seems.
I re-ran the numbers with todays nightly (so pgo optimized build vs not in my previous results) with a clean profile and with d2d disabled. Something's definitely going on with some of the "go to '*'" tests, but other than that this is looking good! I added the results to the sheet "hw accel 20101112" in the same spreadsheet.
(In reply to comment #33)
> I re-ran the numbers with todays nightly (so pgo optimized build vs not in my
> previous results) with a clean profile and with d2d disabled. Something's
> definitely going on with some of the "go to '*'" tests, but other than that
> this is looking good! I added the results to the sheet "hw accel 20101112" in
> the same spreadsheet.

I'm also happy to see that it seems this patch gives us some improvements on no-accel builds as far as I can see (as expected!)
With the landing of 588271, this has hopefully improved further.
Whiteboard: softblocker
Does this bug still exist? The improved border rendering code should have landed and it shouldn't. (but it might still)
Whiteboard: softblocker → [soft blocker]
Whiteboard: [soft blocker] → [softblocker]
Is there a good reason that this blocks betaN?  If not, it should be moved over to final+.  (Note: comment 1!)
Whiteboard: [softblocker] → [softblocker][final?]
Has this patch landed? Should it? Has it bitrotten, entirely?
It landed.
I believe this bug does not exist anymore after the improvements we made, but I don't have access to the test suite so I can't check.
jst: can you retest this please?
** PRODUCT DRIVERS PLEASE NOTE **

This bug is one of 7 automatically changed from blocking2.0:betaN+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons:

 - it was marked as a soft blocking issue for a beta
 - the whiteboard indicated that it might be appropriate for a .x release
blocking2.0: betaN+ → .x+
Whiteboard: [softblocker][final?] → [softblocker][final?][approved-patches-landed]
You need to log in before you can comment on or make changes to this bug.