Closed Bug 416305 Opened 16 years ago Closed 16 years ago

Speed up Gaussian blur filter considerably by optimizing for alpha-only surfaces

Categories

(Core :: SVG, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

(Keywords: perf)

Attachments

(3 files)

It's pretty common to use Gaussian blur with a SourceAlpha source to blur just the alpha channel, to get a shadow. Examples do this often (e.g. the infamous http://ejohn.org/files/ecma-cloud.svg ) and the spec even suggests optimizing for it. It's a really easy optimization to detect a SourceAlpha source here and should speed up such blurs by nearly a factor of 4. Probably worth doing for 1.9 since it's easy, and shadow blurs are common and one of our slowest filter operations.
Flags: blocking1.9?
Keywords: perf
Blocks: 403932
Blocks: 399488
Flags: tracking1.9? → blocking1.9?
roc, do you want to block on this?
No, but we could make it wanted1.9.
Agreed.
Flags: wanted1.9+
Flags: blocking1.9?
Flags: blocking1.9-
Blocks: 421358
Not for 1.9, but I'm working on it.
Flags: wanted1.9-
Flags: wanted1.9+
Flags: wanted-next+
I have a fix for this in my tree, works great. I'll break it out for submission later.
Okay, here's a big patch that restructures filters to lay the groundwork for the optimizations. There are a few things going on here:
-- nsSVGFilterResource is eliminated. Everything the filter primitives need is provided to them directly.
-- Instead of messing around with an "image dictionary", the filter primitive Filter methods are passed Image objects representing their inputs and write to an Image output object. The Image object contains a gfxImageSurface, a copy of the filter primitive subregion that generate the object, and the colormodel.
-- The Filter method takes a new rect parameter indicating which pixels need to be computed. This rectangle is relative to the temporary image surfaces.
-- Most of the filter logic is moved from nsSVGFilterFrame to nsSVGFilterInstance. An nsSVGFilterInstance contains the FilterAnalysis graph that we were building before, and uses that graph for various analyses as well as (now) the actual drawing. The image dictionary no longer exists because all the image dataflow is encoded in the graph now.

I apologise for the big patch, but it actually does simplify the code quite a bit and makes it easy to develop additional analyses of the filter structure for big wins.

This patch was not intended to change behaviour but it actually improved my gaussmark test from 84s to 81s.
Attachment #326837 - Flags: superreview?(mats.palmgren)
Attachment #326837 - Flags: review?(longsonr)
That patch also has a potential big memory usage saving, because it frees image surfaces once we know there are no more uses in the filter pipeline. So a linear pipeline of N primitives goes from N+1 temporary image surfaces to 2.
Attached patch part 2: fixSplinter Review
This fixes this bug. It's a tiny patch on top of the last one.

This reduced the time for my gaussmark test from 81s to 65s. I experimented with hoisting the aAlphaOnly check out --- making a special alpha-only inner loop --- but that did not produce a significant performance improvement. Probably because the branch is extremely well predicted.
Attachment #326838 - Flags: superreview?(mats.palmgren)
Attachment #326838 - Flags: review?(longsonr)
Comment on attachment 326837 [details] [diff] [review]
part 1: restructure filters

>+  result.mSource = new gfxImageSurface(scaledSize,
>+                                       gfxASurface::ImageFormatARGB32);
>+  result.mTarget = new gfxImageSurface(scaledSize,
>+                                       gfxASurface::ImageFormatARGB32);
>+  if (!result.mSource || result.mSource->CairoStatus() ||
>+      !result.mTarget || result.mTarget->CairoStatus())
>+  {

{ should be on the previous line

>+nsSVGFETurbulenceElement::Filter(nsSVGFilterInstance *instance,
>+                                 const nsTArray<const Image*>& aSources,
>+                                 const Image* aTarget,
>+                                 const nsIntRect& rect)
>+{
>+  PRUint8* targetData = aTarget->mImage->Data();
>+  PRUint32 stride = aTarget->mImage->Stride();
>+
>+  nsIntRect filterSubregion(PRInt32(aTarget->mFilterPrimitiveSubregion.X()),
>+                            PRInt32(aTarget->mFilterPrimitiveSubregion.Y()),
>+                            PRInt32(aTarget->mFilterPrimitiveSubregion.Width()),
>+                            PRInt32(aTarget->mFilterPrimitiveSubregion.Height()));

Why not do this via the new GfxRectToIntRect function?

>-  if (x == rect.x) {
>+  if (x == 0) {
>     xflag = 0;
>-  } else if (x == rect.XMost() - 1) {
>+  } else if (x == rect.width - 1) {
>     xflag = 2;
>   } else {
>     xflag = 1;
>   }
>-  if (y == rect.y) {
>+  if (y == 0) {
>     yflag = 0;
>-  } else if (y == rect.YMost() - 1) {
>+  } else if (y == rect.height - 1) {
>     yflag = 2;
>   } else {
>     yflag = 1;
>   }

This looks right but please ensure the testcase for bug 429774 is checked in as a crashtest to prove it.

> protected:
>   virtual PRBool OperatesOnSRGB(nsSVGFilterInstance* aInstance,
>-                                nsSVGString* aString) {
>-    if (aString == &mStringAttributes[IN1]) {
>-      return aInstance->LookupImageColorModel(aString->GetAnimValue()).mColorSpace ==
>-               nsSVGFilterInstance::ColorModel::SRGB;
>-    }
>+                                PRUint32 aInput, Image* aImage) {
>+    if (aInput == 0 && aImage)
>+      return aImage->mColorModel.mColorSpace == ColorModel::SRGB;
> 

just check aImage and ignore aInput to match comment in nsSVGFilters.h

>+nsresult
>+nsSVGFilterInstance::BuildSourceImages()
>+{
...
>+    // Clear the color channel
>+    PRUint32* src = reinterpret_cast<PRUint32*>(sourceColorAlpha->Data());

Nit: prefer const pointer for src.

>+    PRUint32* dest = reinterpret_cast<PRUint32*>(mSourceAlpha.mImage.mImage->Data());
>+    for (PRInt32 yy = 0; yy < mSurfaceRect.height; yy++) {
>+      PRUint32 rowOffset = (mSourceAlpha.mImage.mImage->Stride()*yy) >> 2;
>+      for (PRInt32 xx = 0; xx < mSurfaceRect.width; xx++) {
>+        dest[rowOffset + xx] = src[rowOffset + xx] & 0xFF000000U;
>+      }
>+    }

Nit: Could just use x and y as the variable names now you've split this into separate functions.
Attachment #326837 - Flags: review?(longsonr) → review+
Attachment #326838 - Flags: review?(longsonr) → review+
(In reply to comment #9)
> { should be on the previous line

OK

> Why not do this via the new GfxRectToIntRect function?

Yeah OK

> This looks right but please ensure the testcase for bug 429774 is checked in
> as a crashtest to prove it.

OK

> 
> > protected:
> >   virtual PRBool OperatesOnSRGB(nsSVGFilterInstance* aInstance,
> >-                                nsSVGString* aString) {
> >-    if (aString == &mStringAttributes[IN1]) {
> >-      return aInstance->LookupImageColorModel(aString->GetAnimValue()).mColorSpace ==
> >-               nsSVGFilterInstance::ColorModel::SRGB;
> >-    }
> >+                                PRUint32 aInput, Image* aImage) {
> >+    if (aInput == 0 && aImage)
> >+      return aImage->mColorModel.mColorSpace == ColorModel::SRGB;
> > 
> 
> just check aImage and ignore aInput to match comment in nsSVGFilters.h

Not sure what you mean here. The code wants to leave the first input in its current format, and do the default for the other input and the output. So we need to test both aInput and aImage.

> Nit: prefer const pointer for src.

OK.

> >+    PRUint32* dest = reinterpret_cast<PRUint32*>(mSourceAlpha.mImage.mImage->Data());
> >+    for (PRInt32 yy = 0; yy < mSurfaceRect.height; yy++) {
> >+      PRUint32 rowOffset = (mSourceAlpha.mImage.mImage->Stride()*yy) >> 2;
> >+      for (PRInt32 xx = 0; xx < mSurfaceRect.width; xx++) {
> >+        dest[rowOffset + xx] = src[rowOffset + xx] & 0xFF000000U;
> >+      }
> >+    }
> 
> Nit: Could just use x and y as the variable names now you've split this into
> separate functions.

OK.

Mats, could you look at this? I've got a pretty long queue of stuff that follows this :-).
(In reply to comment #10)
> Not sure what you mean here. The code wants to leave the first input in its
> current format, and do the default for the other input and the output. So we
> need to test both aInput and aImage.
> 

My mistake, I read it wrong. Ignore that comment.
Comment on attachment 326837 [details] [diff] [review]
part 1: restructure filters

sr=mats

> content/svg/content/src/nsSVGFilters.cpp
>+CopyRect(...)
> {
>+  NS_ASSERTION(aDest->mImage->Stride() == aSrc->mImage->Stride(), "stride mismatch");
>+  CopyDataRect(aDest->mImage->Data(), aSrc->mImage->Data(),
>+               aSrc->mImage->Stride(), aDataRect);

Can we assert aDataRect is within bounds?
(if so, in GaussianBlur() too)
Attachment #326837 - Flags: superreview?(mats.palmgren) → superreview+
Comment on attachment 326838 [details] [diff] [review]
part 2: fix

sr=mats

-      SUM(0); SUM(1); SUM(2); SUM(3);
+      if (!aAlphaOnly) { SUM(GFX_ARGB32_OFFSET_B);
+                         SUM(GFX_ARGB32_OFFSET_G);
+                         SUM(GFX_ARGB32_OFFSET_R); }
+      SUM(GFX_ARGB32_OFFSET_A);

Why not use SUM_PIXEL() here too?
Attachment #326838 - Flags: superreview?(mats.palmgren) → superreview+
(In reply to comment #10)
> > Why not do this via the new GfxRectToIntRect function?
> 
> Yeah OK

Actually we can't because GfxRectToIntRect fails if the input rectangle is not integer-aligned, so I left that as-is.

(In reply to comment #12)
> Can we assert aDataRect is within bounds?
> (if so, in GaussianBlur() too)

Yes, OK.
Pushed part 1 with those comments addressed: a36e8ebbe3f8.
(In reply to comment #13)
> (From update of attachment 326838 [details] [diff] [review])
> sr=mats
> 
> -      SUM(0); SUM(1); SUM(2); SUM(3);
> +      if (!aAlphaOnly) { SUM(GFX_ARGB32_OFFSET_B);
> +                         SUM(GFX_ARGB32_OFFSET_G);
> +                         SUM(GFX_ARGB32_OFFSET_R); }
> +      SUM(GFX_ARGB32_OFFSET_A);
> 
> Why not use SUM_PIXEL() here too?

I could, but that would rely on cpp macros not using lexical scope (i.e. that SUM_PIXEL sees the redefinition of SUM, not the SUM in scope when SUM_PIXEL was declared), which seems a little bit too tricky to me.
Pushed part 2: baa5a51b7f90
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
(In reply to comment #16)
FWIW, it looks like C and C++ standards requires it, 6.10.3.4/6.10.3.5:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

In the examples in 6.10.3.5 they have:

#define x    3
#define f(a) f(x * (a))
#undef  x
#define x    2
...
f(y+1) 

results in f(2 * (y+1))

(I found a C++ draft that have the same text and example)
I agree it's correct, I just prefer to avoid writing code that depends on obscure stuff like that.
I think this broke the shadow of the favicons in my Ctrl-Tab extension (<http://en.design-noir.de/mozilla/ctrl-tab/>):


svg|*.ctrlTab-icon {
  filter: url(chrome://browser/content/browser.xul#ctrlTab-iconShadow);
}

...

          <svg:filter id="ctrlTab-iconShadow">
            <svg:feOffset result="shadow" in="SourceAlpha" dx="2" dy="-1"/>
            <svg:feColorMatrix result="transparent-shadow" in="shadow"
                               values="1 0 0 0   0
                                       0 1 0 0   0
                                       0 0 1 0   0
                                       0 0 0 0.5 0"/>
            <svg:feBlend in="SourceGraphic" in2="transparent-shadow"/>
          </svg:filter>
I also don't see the shadows on <http://ejohn.org/files/ecma-cloud.svg>. I'm on Windows.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I still see shadows on http://ejohn.org/files/ecma-cloud.svg on Windows.
Attached image missing filter
Here's what I see. Works as expected on the 1.9.0 branch.
Dão, you know as well as I do that we don't reopen bugs for regressions, unless we end up backing the patch out.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
I see the regression in the Windows nightly build.
I found what I'm pretty sure is the regression, filed it as bug 445268 with a patch. It only shows up on Windows nightly builds though, so I'll need to land that patch to verify.
Depends on: 500273
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: