Closed Bug 859377 Opened 11 years ago Closed 11 years ago

Improve ClippedImage performance (regressions in Ts Paint, Tp5, TResize, SVG Opacity)

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox21 --- unaffected
firefox22 --- unaffected
firefox23 + fixed

People

(Reporter: seth, Assigned: seth)

References

Details

(Keywords: perf, regression)

Attachments

(5 files, 1 obsolete file)

Bug 826093 introduced ClippedImage, which performs image clipping by twiddling Image::Draw parameters whenever possible, instead of copying the clipped area of the image into a new surface. This should be a major win in the long term, especially once we gain an API for source clipping on backends that support it. However, it currently causes Talos regressions on Windows (and doesn't perform as well as it could on other platforms) because when ClippedImage _is_ forced to create a surface (due to tiling, for example), it doesn't save that surface across calls.

This bug is about fixing two issues:

- We should memoize any surface that ClippedImage creates. As long as the parameters to Draw and the animation state of the underlying image haven't changed, we should keep it around.
- ClippedImage should be able to detect, as much as possible, cases where the underlying Image will cause a temporary surface to be created. Right now there are some cases that are hard to detect from ClippedImage. It'd be good to characterize those cases, and do what needs to be done (possibly including tweaking the Draw API) to ensure that ClippedImage can memoize those cases when possible.
See bug 826093 comment 59 for the Talos regressions that need to be fixed.
In addition to the Ts, Tp, Paint, and TResize regressions already noted in bug 826093, this also caused a 200% regression in SVG Opacity:

http://graphs.mozilla.org/graph.html#tests=[[225,63,12]]&sel=1364962588994.152,1365432198580.3293&displayrange=90&datatype=running
Keywords: perf, regression
Summary: Improve ClippedImage performance → Improve ClippedImage performance (regressions in Ts Paint, Tp5, TResize, SVG Opacity)
Nominating for tracking-firefox23 because this is a major regression in several of our performance measurements.
Moving this bug up in my priorities.
mbrubeck, dholbert and myself discussed the SVG Opacity regression on IRC and we came away still unsure of how the regression happened (though it does seem attributable to bug 826093, as far as I can tell). We don't yet understand the mechanism of action, since the SVG Opacity test shouldn't hit the code paths that the patch affects. My best guess at this point is that possible there is a negative interaction if painting the UI ends up requiring more temporary surfaces. (Since the UI _does_ use border image. I checked and that is the only use of border image or -moz-image-rect in the SVG Opacity test; the test itself does not use them.) This try job has a hack that is meant to see if memoization will help:

https://tbpl.mozilla.org/?tree=Try&rev=ab62862733ed
Here's the perf-o-matic record of the SVG Opacity test's performance history, for comparison:

http://graphs.mozilla.org/graph.html#tests=[[225,63,12],[225,131,12]]&sel=1358023044077,1365799044077&displayrange=90&datatype=running
So, it looks like memoization will fix the problem. Performance with memoization is the same as performance before the patch was introduced. Now I just need to make a clean memoization patch.
I've pushed a second attempt at this to try. This patch includes cache invalidation and other goodies that will be needed to actually make this work. It also _removes_ the support for caching ImageContainers that the first patch had. I need to think a bit more about that, and at any rate I'd like to see whether it actually matters for Talos.

Here's the try job: https://tbpl.mozilla.org/?tree=Try&rev=0db559fada38
Just so I don't have to keep looking at the perf-o-matic results, I'm noting here that I'm regarding a "good" score for the SVG Opacity test on Windows 7 non-PGO as ~200. The regressed score that we're trying to fix here is ~470.

My latest patch also gets a score of ~200, so that's very good news. It needs a little more work before it's ready for review. I'm about to embark on that now.
Blocks: 764299
Blocks: 600207
To invalidate the cache correctly we need a little more information about animation state.
Attachment #741539 - Flags: review?(joe)
Attachment #741539 - Flags: superreview?(bzbarsky)
In Part 3 we'll end up storing SVGImageContext in a Maybe<T>. This means we can't mark it as a stack class any longer. We also need to add a way to test for the equality of SVGImageContexts as part of the cache invalidation process.
Attachment #741585 - Flags: review?(dholbert)
Using the code added in parts 1 and 2, we can cache the temporary surfaces produced by ClippedImage and invalidate the cache correctly. Note that this patch doesn't include support for GetImageContainer; I've decided that the risk for that is too high to do it as a part of this bug, so that will happen elsewhere.

With this patch (and the preceding ones) applied, the regressed Talos tests should be back to the level of performance they had before bug 826093.
Attachment #741605 - Flags: review?(joe)
Comment on attachment 741585 [details] [diff] [review]
(Part 2) - Make SVGImageContext more flexible.

Looks good. One thought:

>-class MOZ_STACK_CLASS SVGImageContext
>+class SVGImageContext

Since this keyword-removal is such a small change (and yet a somewhat-significant change that we wouldn't want to take on its own), I'd marginally prefer that it happen as part of the patch that actually requires it (part 3)

But it doesn't really matter; r=me either way.
Attachment #741585 - Flags: review?(dholbert) → review+
Blocks: 865518
The issue of GetImageContainer is explored in bug 865518.
Here's a Talos try job for the final (modulo reviews) version of the patch:

https://tbpl.mozilla.org/?tree=Try&rev=c9a5cca8b316
Thanks for the review, Daniel!
Comment on attachment 741539 [details] [diff] [review]
(Part 1) - Add imgIContainer::GetFrameIndex.

sr=me
Attachment #741539 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 741539 [details] [diff] [review]
(Part 1) - Add imgIContainer::GetFrameIndex.

Review of attachment 741539 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +1605,5 @@
> +{
> +  MOZ_ASSERT(aWhichFrame <= FRAME_MAX_VALUE, "Invalid argument");
> +  return (aWhichFrame == FRAME_FIRST || !mAnim)
> +         ? 0.0f
> +         : mAnim->currentAnimationFrameIndex;

oh my goodness if would look so much nicer
Attachment #741539 - Flags: review?(joe) → review+
Comment on attachment 741605 [details] [diff] [review]
(Part 3) - Make ClippedImage cache temporary surfaces.

Review of attachment 741605 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/ClippedImage.cpp
@@ +42,5 @@
> +  {
> +    bool matchesSVGContext = (!aSVGContext && mSVGContext.empty())
> +                          || *aSVGContext == mSVGContext.ref();
> +    return mViewportSize == aViewportSize
> +        && matchesSVGContext

&&, || on previous line

@@ +226,5 @@
> +                                                  aSVGContext,
> +                                                  frameToDraw,
> +                                                  aFlags)) {
> +    // Create a surface to draw into.
> +    gfxImageSurface::gfxImageFormat format = gfxASurface::ImageFormatARGB32;

You should either inline this in the call to CreateOffscreenSurface or use it in the call to DrawPixelSnapped. :)

Is there any reason you don't just call GetFrameInternal here?
Attachment #741605 - Flags: review?(joe) → review+
Thanks for the reviews, Joe. I'll upload updated patches shortly. (Though I gotta tell you, I can't understand your lack of enthusiasm for the ternary operator. =)

> Is there any reason you don't just call GetFrameInternal here?

Yeah, because this code is in the implementation for GetFrameInternal. =) Just sanity checking this comment in case it got attached to the wrong place by Splinter.
So I've just looked over the SVG Opacity results that are posted in this bug over the various iterations of this patch, and here's how things look to me:

- Prior to bug 826093, we had been able to get ~200 for SVG Opacity on Windows 7 non-PGO.
- With bug 826093 only, we were typically getting ~460.
- With the first patch in this bug, which always cached the image and also implemented GetImageContainer, we were back to ~200.
- With the final patch, which only caches the image in the case where we needed to create a temporary surface in ClippedImage, and does not implement GetImageContainer, we're at ~260.

So caching has gotten us a long way, and it's worth going ahead and checking in this change, but it hasn't gotten us the whole way by itself.

There are three factors that could explain the remaining discrepancy, individually or together: _always_ caching, GetImageContainer, and the fact that the first patch bypassed some of the drawing logic in ClippedImage. These things are rather interrelated and it's hard to tease them apart. I'll try to put together some experiments to clarify this a bit.
Here's a Talos run where GetImageContainer isn't implemented, but MustCreateSurface always returns true, so we always cache.

https://tbpl.mozilla.org/?tree=Try&rev=f66c964b5928
Made changes based on review comments.
Attachment #741605 - Attachment is obsolete: true
Here's a Talos run where GetImageContainer is implemented, but we don't always cache. (This means GetImageContainer only works when we cache, at least right now.)

https://tbpl.mozilla.org/?tree=Try&rev=0920c606378e
Here's the composition of the previous two. We always cache, and GetImageContainer is implemented. (Since we always cache, it works all the time.)

https://tbpl.mozilla.org/?tree=Try&rev=82ea1c752955
I'll go ahead and push in the r+'d patches so far, but marking this [leave open] until we learn a bit more.
Whiteboard: [leave open]
So we have the Talos results back now. Recall that we're shooting for ~200, and the current patch gets us to ~260.

When we always cache, we get 185, surprisingly enough. I can't believe that this is the right choice for performance in general, though.

When we implement GetImageContainer for only those cases where we cache, oddly enough we still manage to get 188!

The composition of the two yields 184.

I've retriggered all of these tests so we can get a bit more data; these are results from only one run.

Right now, given this information, I'm inclined to attempt to figure out the cache invalidation problems with the GetImageContainer implementation and get that code ready to land. We will then hopefully be slightly _better_ off than we were at the beginning of this whole thing.
Sorry, but it looks like the three patches that landed caused serious regressions in both the time and memory benchmarks from Tp5 on Windows 7 and 8 (including a 120% increase in latency on Windows 7 in both PGO and non-PGO builds) but not on any other platforms:
https://groups.google.com/d/topic/mozilla.dev.tree-management/J-chchil6dM/discussion

The same regression showed up when those patches were pushed to Try:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=07e17dd7813b&newRev=c9a5cca8b316&submit=true

Can we back these out to fix these new regressions?
Might be a good idea to needinfo when it's important like this :)
Flags: needinfo?(seth)
Looking into this now; will reach a decision on what to do shortly.

I'm trying to understand how these numbers have changed across the various patches that have been tried in this bug, so I can get a sense for where the problems are coming from.
Flags: needinfo?(seth)
For my own info, I'm copying here the original regressions that inspired that this bug be opened in the first place. This is the same as bug 826093 comment 59, but I've added some inline commentary.

> Could this be causing the large TResize talos regressions?

> Regression: Mozilla-Inbound-Non-PGO - TResize - WINNT 6.2 x64 - 28.6% increase
> Regression: Mozilla-Inbound-Non-PGO - TResize - Win7 - 57.8% increase

The patches from this bug provided some improvement here:

Improvement: Mozilla-Inbound-Non-PGO - TResize - WINNT 6.2 x64 - 17.7% decrease
Improvement: Mozilla-Inbound-Non-PGO - TResize - Win7 - 14.1% decrease 

We're still not back to baseline, though.

> as well as smaller regressions for Paint:
> Regression: Mozilla-Inbound-Non-PGO - Paint - Win7 - 10.6% increase
>
> and Ts, Paint:
> Regression: Mozilla-Inbound-Non-PGO - Ts, Paint - Win7 - 3.63% increase
> Regression: Mozilla-Inbound-Non-PGO - Ts Paint, MAX Dirty Profile - Win7 - 3.52% increase
> Regression: Mozilla-Inbound-Non-PGO - Ts Paint, MED Dirty Profile - Win7 - 3.97% increase

No movement on these so far.
Here's the RSS regression:

> Regression: Mozilla-Inbound-Non-PGO - Tp5 Optimized (Main RSS) - Win7 - 18.4% increase
> --------------------------------------------------------------------------------------
>    Previous: avg 141660083.333 stddev 2349814.517 of 12 runs up to revision 0ae79e3f9e6f
>    New     : avg 167746916.667 stddev 2259581.516 of 12 runs since revision 5a913ab3d2a5
>    Change  : +26086833.333 (18.4% / z=11.102)
>    Graph   : http://mzl.la/10Etgjz 

I expected memory usage to go up after this change. (We're caching surfaces after all!) I'm not sure whether this actually constitutes a problem, assuming that the caching is desirable.

Here's the perf regression:

> Regression: Mozilla-Inbound-Non-PGO - Tp5 Optimized MozAfterPaint - Win7 - 120% increase
> ----------------------------------------------------------------------------------------
>     Previous: avg 1213.500 stddev 75.276 of 12 runs up to revision 0ae79e3f9e6f
>     New     : avg 2670.500 stddev 188.789 of 12 runs since revision 5a913ab3d2a5
>     Change  : +1457.000 (120% / z=19.355)
>     Graph   : http://mzl.la/10EtiIg 

This is a huge regression. I'm looking into this now and will back out if I can't figure it out.
Here's a Talos job with none of the changes in this bug applied. I wanted to run this to give a baseline:

https://tbpl.mozilla.org/?tree=Try&rev=cfe7add46123
Trying to narrow down the issue, here's a small variant of the changes in this bug that prevents caching from ever occurring, but keeps all the other changes. I want to make sure the problem comes from the place it seems to be coming from.

https://tbpl.mozilla.org/?tree=Try&rev=53ebb0f870c7
So the results are in and both of those tests do not exhibit the regression. This means that the regression is directly related to *ClippedImageCachedSurface itself*. A lot of possibilities have now been eliminated - for example, the act of creating the surfaces isn't responsible, as the test in comment 35 does this in exactly the same way as the reviewed version of the patches on this bug, it just doesn't cache them.

I've also taken a look at a number of other possibilities - for example, that we generate excessively large surfaces. None of them have panned out. There is something really weird going on here. I have a hunch that this is a Windows-ism that has been dealt with before; I'm going to spelunk a little and see what other code does.
Note also that the invalidation logic in ClippedImageCachedSurface cannot be responsible, because the version tested in comment 5 does not do invalidation, and we still see the regression. This really does not leave much space for the bug to hide: holding on to the surface we're producing seems to be bad, even though there's no problem if we use it and then throw it away.
Here's an experiment where I force the use of gfxImageSurfaces instead of hardware surfaces on Windows. I think this might fix the problem.

https://tbpl.mozilla.org/?tree=Try&rev=4ff409f4d3e2
Bingo. I could just push the patch above and fix the regression, but after some reflection I think the right thing here is to rewrite ClippedImageCachedSurface to store an imgFrame instead of a raw gfxASurface. imgFrame already has the smarts to avoid issues like these baked in. I don't foresee this taking a long time so I'd rather avoid introducing a temporary patch that might well get completely ripped out on the same day it's pushed in. If things take longer than anticipated, the patch above is always available.
OK, here's a test that uses imgFrames instead of raw gfxASurfaces for the cache. This should eliminate the regression. If this goes well, this will become Part 4.

https://tbpl.mozilla.org/?tree=Try&rev=eeced1846f5d
There was a bug in the previous version that caused a lot of failures. I took the opportunity to make some changes I realized I had intended, but forgot, as well.

https://tbpl.mozilla.org/?tree=Try&rev=10bca78596fc
Blocks: 867009
Here's a Talos run of (hopefully) the final version of the patch. This cleans up some more issues and also permits the use of imgFrame::Optimize on the cached image.

https://tbpl.mozilla.org/?tree=Try&rev=c0fd3788b227
Note that these new patches, while eliminating the TP5 Optimized mozAfterPaint regression, _also_ eliminate virtually all of our SVG opacity gains. Bummer. I suspect that using hardware surfaces was similar in performance to using GetImageContainer, while using software surfaces is not much better in the end than just drawing every time. This would also explain why GetImageContainer didn't seem to add much in the tests above - either way we're using hardware surfaces.

It sucks but I think this stuff needs to cook more; I don't want to push anything in that fixes one problem while reintroducing another. However, I need to take some action immediately, because I suspect my rampant use of hardware surfaces on Windows is the cause of this issue:

http://benjamin.smedbergs.us/blog/2013-04-11/graph-of-the-day-firefox-virtual-memory-plot/

My plan is to push in a tiny patch that switches to software surfaces on Windows and eliminates that issue. This should buy more time to make GetImageContainer safe.
Here's that temporary patch. No review as this is just a bandaid. I'm still aggressively working on a complete fix.

https://hg.mozilla.org/integration/mozilla-inbound/rev/cc82e1599dd0
Blocks: 866526
Blocks: 859955
(In reply to Seth Fowler [:seth] from comment #44)
> It sucks but I think this stuff needs to cook more; I don't want to push
> anything in that fixes one problem while reintroducing another. However, I
> need to take some action immediately, because I suspect my rampant use of
> hardware surfaces on Windows is the cause of this issue:
> 
> http://benjamin.smedbergs.us/blog/2013-04-11/graph-of-the-day-firefox-
> virtual-memory-plot/

Is there any way to confirm this? I've been seeing OOM the past few days, and checking about:memory on my browser before the last one showed that I had almost 4GB of memory committed, but the explicit allocations didn't add up to more than 1GB or so.
Depends on: 866648
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #47)
> Is there any way to confirm this? I've been seeing OOM the past few days,
> and checking about:memory on my browser before the last one showed that I
> had almost 4GB of memory committed, but the explicit allocations didn't add
> up to more than 1GB or so.

The circumstantial evidence strongly points to this being the cause, and bug 866526 comment 12 has confirmed that after pushing Part 4 the huge spike in empty crashes is gone.
Agreed, I was crashing 2-3x daily but haven't at all post-backout.
Here's a test of a new patch stack that implements GetImageContainer for ClippedImage. Let's see how this fares against Talos:

https://tbpl.mozilla.org/?tree=Try&rev=8ed0545b0c70
Hmm, no good. SVG Opacity performance is still poor. Pushed a modification that reintroduces hardware surfaces, but this time managed by imgFrame. Let's see how that goes:

https://tbpl.mozilla.org/?tree=Try&rev=eb04691af13e
Sigh. Let's try that again, but this time forcing GetImageContainer to always succeed for RasterImages.

https://tbpl.mozilla.org/?tree=Try&rev=5245b8047fcb
There was a flaw in that version. Here's a fixed version:

https://tbpl.mozilla.org/?tree=Try&rev=58a95d320dfb
A variant that doesn't call imgFrame::Optimize:

https://tbpl.mozilla.org/?tree=Try&rev=47dc004a66ff
On Friday I pushed a test right before leaving that didn't end up getting posted here. The results are quite promising. By switching from CreateOffscreenSurface to CreateOffscreenDrawTarget, the tp5o_shutdown_paint regression seems to be gone!

https://tbpl.mozilla.org/?tree=Try&rev=4fa753389953
Given https://bugzilla.mozilla.org/show_bug.cgi?id=837835#c56, should we consider speculatively uplifting part 4 to Aurora? Seems appropriate to me, unless the risk is high.
Not only that, but there's also a significant improvement in TResize.

Bas is working on a patch that changes the characteristics of the textures we allocate on Windows. When I combine the patch in comment 58 with Bas's patch, I get this, which seems to fully resolve all the performance regressions from bug 826093.

https://tbpl.mozilla.org/?tree=Try&rev=523239949e7b
(In reply to Alex Keybl [:akeybl] from comment #59)
> Given https://bugzilla.mozilla.org/show_bug.cgi?id=837835#c56, should we
> consider speculatively uplifting part 4 to Aurora? Seems appropriate to me,
> unless the risk is high.

Alex, the bug that these regression were caused by, bug 826093, hit m-c on 4-05-2013. Given that, this issue can't affect Aurora.

Although this bug did cause some of the empty dump crashes, the rest are coming from elsewhere, and those other sources are the ones affecting Aurora.
Bas' patch is part of bug 805406.
Depends on: 805406
This is a real fix that appears to solve virtually all the regressions, and in combination with Bas' patch in bug 805406 will totally restore the status quo. It's identical to the patch tested on try in comment 58 except that I've cleaned it up for review, so another try job shouldn't be needed.
Attachment #746131 - Flags: review?(joe)
At this time I plan to call this fixed if the only remaining regression is in tsvgr_opacity. There's some very strange interaction going on there. I discussed this on IRC briefly with mbrubeck and Bas.

> mbrubeck: Bas: They've always been weirdly multimodal, so there seems to be some sensitivity in the tests... maybe some timing change elsewhere causes a change in whether something else (gc, i/o...) happens before/during/after the test

Matt also pointed me in the direction of this graph, which shows the multimodal nature of the benchmark he mentions above pretty clearly:

http://graphs.mozilla.org/graph.html#tests=[[225,94,12]]&sel=none&displayrange=365&datatype=running

This doesn't seem like something worth optimizing for, especially since fixes that make tsvgr_opacity run better tend to make tsvgr much worse. The previous status quo tilted things in favor of tsvgr_opacity, but I'm inclined to allow them to tilt in favor of tsvgr.
Note that when I run tsvgr_opacity locally, ClippedImage is only called when drawing the browser chrome. So the fact that some of the patches in this bug and Bas' patch in bug 805406 both cause such a wide swing (the score varies from 200 to 400 on Windows 7) is truly bizarre.
(In reply to Seth Fowler [:seth] from comment #63)
> It's identical to the patch tested on try in comment 58 except that
> I've cleaned it up for review, so another try job shouldn't be needed.

, he said optimistically.
Attachment #746131 - Flags: review?(joe) → review+
(In reply to Joe Drew (:JOEDREW! \o/) from comment #66)
> , he said optimistically.

I fully recognize my hubris. =)
Thanks to Joe's review and inbound reopening we are go. This should clean up the regressions (especially in combination with Bas' patch in bug 805406, if we decide to go for that). Hopefully it won't cause a spike in crashes or any other side effects and this bug can be closed.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d571f60b80fd
Looking over Perf-o-Matic it seems that the main goals here have been achieved, although the data is kinda noisy so I'm not totally sure. The performance profile  has changed a bit, but I think it's change for the best (ie. we favor tsvgr over tsvgr_opacity). If bug 805406 lands, that should flip things back to the tradeoffs we used to make, if the consensus is that that would be preferred.

Based on these considerations, I'm marking this fixed and will shortly begin to build more things on top of ClippedImage. If anyone feels that there's more to do here, please reopen.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: