416305 - Speed up Gaussian blur filter considerably by optimizing for alpha-only surfaces

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Description

•

18 years ago

It's pretty common to use Gaussian blur with a SourceAlpha source to blur just the alpha channel, to get a shadow. Examples do this often (e.g. the infamous http://ejohn.org/files/ecma-cloud.svg ) and the spec even suggests optimizing for it. It's a really easy optimization to detect a SourceAlpha source here and should speed up such blurs by nearly a factor of 4. Probably worth doing for 1.9 since it's easy, and shadow blurs are common and one of our slowest filter operations.

Flags: blocking1.9?

Jesse Ruderman

Updated

•

18 years ago

Keywords: perf

Frédéric Buclin

Updated

•

18 years ago

Blocks: 403932

Jonathan Watt [:jwatt]

Updated

•

18 years ago

Blocks: 399488

Mike Connor [:mconnor]

Updated

•

17 years ago

Flags: tracking1.9? → blocking1.9?

Jonathan Watt [:jwatt]

Comment 1

•

17 years ago

roc, do you want to block on this?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 2

•

17 years ago

No, but we could make it wanted1.9.

Jonathan Watt [:jwatt]

Comment 3

•

17 years ago

Agreed.

Flags: wanted1.9+

Flags: blocking1.9?

Flags: blocking1.9-

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

17 years ago

Blocks: 420953

Robert Longson [:longsonr]

Updated

•

17 years ago

Blocks: 421358

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 4

•

17 years ago

Not for 1.9, but I'm working on it.

Flags: wanted1.9-

Flags: wanted1.9+

Flags: wanted-next+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 5

•

17 years ago

I have a fix for this in my tree, works great. I'll break it out for submission later.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 6

•

17 years ago

Attached patch part 1: restructure filters — Details — Splinter Review

Okay, here's a big patch that restructures filters to lay the groundwork for the optimizations. There are a few things going on here: -- nsSVGFilterResource is eliminated. Everything the filter primitives need is provided to them directly. -- Instead of messing around with an "image dictionary", the filter primitive Filter methods are passed Image objects representing their inputs and write to an Image output object. The Image object contains a gfxImageSurface, a copy of the filter primitive subregion that generate the object, and the colormodel. -- The Filter method takes a new rect parameter indicating which pixels need to be computed. This rectangle is relative to the temporary image surfaces. -- Most of the filter logic is moved from nsSVGFilterFrame to nsSVGFilterInstance. An nsSVGFilterInstance contains the FilterAnalysis graph that we were building before, and uses that graph for various analyses as well as (now) the actual drawing. The image dictionary no longer exists because all the image dataflow is encoded in the graph now. I apologise for the big patch, but it actually does simplify the code quite a bit and makes it easy to develop additional analyses of the filter structure for big wins. This patch was not intended to change behaviour but it actually improved my gaussmark test from 84s to 81s.

Attachment #326837 - Flags: superreview?(mats.palmgren)

Attachment #326837 - Flags: review?(longsonr)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 7

•

17 years ago

That patch also has a potential big memory usage saving, because it frees image surfaces once we know there are no more uses in the filter pipeline. So a linear pipeline of N primitives goes from N+1 temporary image surfaces to 2.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 8

•

17 years ago

Attached patch part 2: fix — Details — Splinter Review

This fixes this bug. It's a tiny patch on top of the last one. This reduced the time for my gaussmark test from 81s to 65s. I experimented with hoisting the aAlphaOnly check out --- making a special alpha-only inner loop --- but that did not produce a significant performance improvement. Probably because the branch is extremely well predicted.

Attachment #326838 - Flags: superreview?(mats.palmgren)

Attachment #326838 - Flags: review?(longsonr)

Robert Longson [:longsonr]

Comment 9

•

17 years ago

Comment on attachment 326837 [details] [diff] [review] part 1: restructure filters >+ result.mSource = new gfxImageSurface(scaledSize, >+ gfxASurface::ImageFormatARGB32); >+ result.mTarget = new gfxImageSurface(scaledSize, >+ gfxASurface::ImageFormatARGB32); >+ if (!result.mSource || result.mSource->CairoStatus() || >+ !result.mTarget || result.mTarget->CairoStatus()) >+ { { should be on the previous line >+nsSVGFETurbulenceElement::Filter(nsSVGFilterInstance *instance, >+ const nsTArray<const Image*>& aSources, >+ const Image* aTarget, >+ const nsIntRect& rect) >+{ >+ PRUint8* targetData = aTarget->mImage->Data(); >+ PRUint32 stride = aTarget->mImage->Stride(); >+ >+ nsIntRect filterSubregion(PRInt32(aTarget->mFilterPrimitiveSubregion.X()), >+ PRInt32(aTarget->mFilterPrimitiveSubregion.Y()), >+ PRInt32(aTarget->mFilterPrimitiveSubregion.Width()), >+ PRInt32(aTarget->mFilterPrimitiveSubregion.Height())); Why not do this via the new GfxRectToIntRect function? >- if (x == rect.x) { >+ if (x == 0) { > xflag = 0; >- } else if (x == rect.XMost() - 1) { >+ } else if (x == rect.width - 1) { > xflag = 2; > } else { > xflag = 1; > } >- if (y == rect.y) { >+ if (y == 0) { > yflag = 0; >- } else if (y == rect.YMost() - 1) { >+ } else if (y == rect.height - 1) { > yflag = 2; > } else { > yflag = 1; > } This looks right but please ensure the testcase for bug 429774 is checked in as a crashtest to prove it. > protected: > virtual PRBool OperatesOnSRGB(nsSVGFilterInstance* aInstance, >- nsSVGString* aString) { >- if (aString == &mStringAttributes[IN1]) { >- return aInstance->LookupImageColorModel(aString->GetAnimValue()).mColorSpace == >- nsSVGFilterInstance::ColorModel::SRGB; >- } >+ PRUint32 aInput, Image* aImage) { >+ if (aInput == 0 && aImage) >+ return aImage->mColorModel.mColorSpace == ColorModel::SRGB; > just check aImage and ignore aInput to match comment in nsSVGFilters.h >+nsresult >+nsSVGFilterInstance::BuildSourceImages() >+{ ... >+ // Clear the color channel >+ PRUint32* src = reinterpret_cast<PRUint32*>(sourceColorAlpha->Data()); Nit: prefer const pointer for src. >+ PRUint32* dest = reinterpret_cast<PRUint32*>(mSourceAlpha.mImage.mImage->Data()); >+ for (PRInt32 yy = 0; yy < mSurfaceRect.height; yy++) { >+ PRUint32 rowOffset = (mSourceAlpha.mImage.mImage->Stride()*yy) >> 2; >+ for (PRInt32 xx = 0; xx < mSurfaceRect.width; xx++) { >+ dest[rowOffset + xx] = src[rowOffset + xx] & 0xFF000000U; >+ } >+ } Nit: Could just use x and y as the variable names now you've split this into separate functions.

Attachment #326837 - Flags: review?(longsonr) → review+

Robert Longson [:longsonr]

Updated

•

17 years ago

Attachment #326838 - Flags: review?(longsonr) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 10

•

17 years ago

(In reply to comment #9) > { should be on the previous line OK > Why not do this via the new GfxRectToIntRect function? Yeah OK > This looks right but please ensure the testcase for bug 429774 is checked in > as a crashtest to prove it. OK > > > protected: > > virtual PRBool OperatesOnSRGB(nsSVGFilterInstance* aInstance, > >- nsSVGString* aString) { > >- if (aString == &mStringAttributes[IN1]) { > >- return aInstance->LookupImageColorModel(aString->GetAnimValue()).mColorSpace == > >- nsSVGFilterInstance::ColorModel::SRGB; > >- } > >+ PRUint32 aInput, Image* aImage) { > >+ if (aInput == 0 && aImage) > >+ return aImage->mColorModel.mColorSpace == ColorModel::SRGB; > > > > just check aImage and ignore aInput to match comment in nsSVGFilters.h Not sure what you mean here. The code wants to leave the first input in its current format, and do the default for the other input and the output. So we need to test both aInput and aImage. > Nit: prefer const pointer for src. OK. > >+ PRUint32* dest = reinterpret_cast<PRUint32*>(mSourceAlpha.mImage.mImage->Data()); > >+ for (PRInt32 yy = 0; yy < mSurfaceRect.height; yy++) { > >+ PRUint32 rowOffset = (mSourceAlpha.mImage.mImage->Stride()*yy) >> 2; > >+ for (PRInt32 xx = 0; xx < mSurfaceRect.width; xx++) { > >+ dest[rowOffset + xx] = src[rowOffset + xx] & 0xFF000000U; > >+ } > >+ } > > Nit: Could just use x and y as the variable names now you've split this into > separate functions. OK. Mats, could you look at this? I've got a pretty long queue of stuff that follows this :-).

Robert Longson [:longsonr]

Comment 11

•

17 years ago

(In reply to comment #10) > Not sure what you mean here. The code wants to leave the first input in its > current format, and do the default for the other input and the output. So we > need to test both aInput and aImage. > My mistake, I read it wrong. Ignore that comment.

Mats Palmgren (inactive)

Comment 12

•

17 years ago

Comment on attachment 326837 [details] [diff] [review] part 1: restructure filters sr=mats > content/svg/content/src/nsSVGFilters.cpp >+CopyRect(...) > { >+ NS_ASSERTION(aDest->mImage->Stride() == aSrc->mImage->Stride(), "stride mismatch"); >+ CopyDataRect(aDest->mImage->Data(), aSrc->mImage->Data(), >+ aSrc->mImage->Stride(), aDataRect); Can we assert aDataRect is within bounds? (if so, in GaussianBlur() too)

Attachment #326837 - Flags: superreview?(mats.palmgren) → superreview+

Mats Palmgren (inactive)

Comment 13

•

17 years ago

Comment on attachment 326838 [details] [diff] [review] part 2: fix sr=mats - SUM(0); SUM(1); SUM(2); SUM(3); + if (!aAlphaOnly) { SUM(GFX_ARGB32_OFFSET_B); + SUM(GFX_ARGB32_OFFSET_G); + SUM(GFX_ARGB32_OFFSET_R); } + SUM(GFX_ARGB32_OFFSET_A); Why not use SUM_PIXEL() here too?

Attachment #326838 - Flags: superreview?(mats.palmgren) → superreview+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 14

•

17 years ago

(In reply to comment #10) > > Why not do this via the new GfxRectToIntRect function? > > Yeah OK Actually we can't because GfxRectToIntRect fails if the input rectangle is not integer-aligned, so I left that as-is. (In reply to comment #12) > Can we assert aDataRect is within bounds? > (if so, in GaussianBlur() too) Yes, OK.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 15

•

17 years ago

Pushed part 1 with those comments addressed: a36e8ebbe3f8.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 16

•

17 years ago

(In reply to comment #13) > (From update of attachment 326838 [details] [diff] [review]) > sr=mats > > - SUM(0); SUM(1); SUM(2); SUM(3); > + if (!aAlphaOnly) { SUM(GFX_ARGB32_OFFSET_B); > + SUM(GFX_ARGB32_OFFSET_G); > + SUM(GFX_ARGB32_OFFSET_R); } > + SUM(GFX_ARGB32_OFFSET_A); > > Why not use SUM_PIXEL() here too? I could, but that would rely on cpp macros not using lexical scope (i.e. that SUM_PIXEL sees the redefinition of SUM, not the SUM in scope when SUM_PIXEL was declared), which seems a little bit too tricky to me.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 17

•

17 years ago

Pushed part 2: baa5a51b7f90

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

17 years ago

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Mats Palmgren (inactive)

Comment 18

•

17 years ago

(In reply to comment #16) FWIW, it looks like C and C++ standards requires it, 6.10.3.4/6.10.3.5: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf In the examples in 6.10.3.5 they have: #define x 3 #define f(a) f(x * (a)) #undef x #define x 2 ... f(y+1) results in f(2 * (y+1)) (I found a C++ draft that have the same text and example)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 19

•

17 years ago

I agree it's correct, I just prefer to avoid writing code that depends on obscure stuff like that.

Dão Gottwald [:dao]

Comment 20

•

17 years ago

I think this broke the shadow of the favicons in my Ctrl-Tab extension (<http://en.design-noir.de/mozilla/ctrl-tab/>): svg|*.ctrlTab-icon { filter: url(chrome://browser/content/browser.xul#ctrlTab-iconShadow); } ... <svg:filter id="ctrlTab-iconShadow"> <svg:feOffset result="shadow" in="SourceAlpha" dx="2" dy="-1"/> <svg:feColorMatrix result="transparent-shadow" in="shadow" values="1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0.5 0"/> <svg:feBlend in="SourceGraphic" in2="transparent-shadow"/> </svg:filter>

Dão Gottwald [:dao]

Comment 21

•

17 years ago

I also don't see the shadows on <http://ejohn.org/files/ecma-cloud.svg>. I'm on Windows.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Robert Longson [:longsonr]

Comment 22

•

17 years ago

I still see shadows on http://ejohn.org/files/ecma-cloud.svg on Windows.

Dão Gottwald [:dao]

Comment 23

•

17 years ago

Attached image missing filter — Details

Here's what I see. Works as expected on the 1.9.0 branch.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 24

•

17 years ago

Dão, you know as well as I do that we don't reopen bugs for regressions, unless we end up backing the patch out.

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 25

•

17 years ago

I see the regression in the Windows nightly build.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

17 years ago

Depends on: 445268

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 26

•

17 years ago

I found what I'm pretty sure is the regression, filed it as bug 445268 with a patch. It only shows up on Windows nightly builds though, so I'll need to land that patch to verify.

Jesse Ruderman

Updated

•

16 years ago

Depends on: 500273

part 1: restructure filters 17 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 151.83 KB, patch	longsonr : review+ MatsPalmgren_bugz : superreview+	Details \| Diff \| Splinter Review
part 2: fix 17 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 12.33 KB, patch	longsonr : review+ MatsPalmgren_bugz : superreview+	Details \| Diff \| Splinter Review
missing filter 17 years ago Dão Gottwald [:dao] 75.73 KB, image/jpeg		Details