Closed Bug 439375 Opened 16 years ago Closed 16 years ago

Improve feGaussianBlur inner loop

Categories

(Core :: SVG, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

Details

Attachments

(2 files, 2 obsolete files)

Attached patch fix (obsolete) — Splinter Review
Bug 399488 made us use a lookup table to speed up the integer division by boxSize that's important in the boxblur inner loop. With a little bit more work we can do better without a lookup table. The attached testcase does the following:

-- backs out 399488, replacing the table lookup by approximating "v/boxSize" with "(v*K) >> 24" where K is chosen appropriately

-- Unifies BoxBlurH/BoxBlurV into a single BoxBlur static function

-- Does three passes over the same row or column of pixels before moving on to the next row or column (better memory locality)

-- For the common case where the blur area is bigger than the boxSize, removes conditional branches from the inner loop by dividing the loop space into three segments: the segment where the left/top edge of the box is to the left/above the blur region; the segment where the box is entirely inside the blur region; and the segment where the right/bottom edge of the box is the right/below the blur region.

(Mats, I'm asking you for sr because Tim is mostly out of action, Vlad and Stuart are on holiday, so we don't really have anyone capable of doing sr's for SVG. If you don't want to deal with this, let me know and I'll find someone else.)
Attachment #325217 - Flags: superreview?(mats.palmgren)
Attachment #325217 - Flags: review?(jwatt)
This is the testcase I used for performance analysis.

When I produced this patch a while ago, I got these results on my Mac:
Trunk: 94646ms
With patch: 84084ms

I've got some major filter restructuring and optimizations in my branch that get this down to 26896ms :-) ... I'll submit those after this patch has landed, I'm trying to avoid unnecessary conflicts.
Attached patch right patch (obsolete) — Splinter Review
Sorry, that was the wrong version of the patch. This is the right version.
Attachment #325217 - Attachment is obsolete: true
Attachment #325219 - Flags: superreview?(mats.palmgren)
Attachment #325219 - Flags: review?(jwatt)
Attachment #325217 - Flags: superreview?(mats.palmgren)
Attachment #325217 - Flags: review?(jwatt)
Comment on attachment 325219 [details] [diff] [review]
right patch

I don't have the domain knowledge to feel comfortable doing
the superreview here, sorry.  All I can offer are a few nits:


+ * So we can try approxmating the division V/N as V*K/(2^24) (integer

approximating

+ * division, 32-bit mutiply). Dividing by 2^24 is a simple shift so it's

multiply

+BoxBlur(PRUint8 *aInput, PRUint8 *aOutput,
+        PRInt32 aStrideMinor,PRInt32 aStartMinor, PRInt32 aEndMinor,
+        PRUint32 leftLobe, PRUint32 rightLobe)

Missing space after comma.
Not your fault, but I'd like consistent naming: aLeftLobe, aRightLobe.
Using 'const' would make the code more readable too.

+  for (PRUint32 i=0; i < boxSize; i++) {
+    PRInt32 pos = aStartMinor - leftLobe + i;

aStartMinor - leftLobe is a loop invariant.

+    for (PRInt32 j=0; j < 4; j++) {
+      sums[j] += aInput[aStrideMinor*pos + j];
     }

aStrideMinor*pos is a (inner) loop invariant.
A matter of taste perhaps, but I find the old code (using four separate
assignments instead of a loop) more readable.

+      for (PRInt32 j=0; j < 4; j++) {
+        sums[j] += aInput[aStrideMinor*next + j] -
+                   aInput[aStrideMinor*last + j];
+      }

Loop invariants. And loop can be replaced with four assignments.


> nsSVGFEGaussianBlurElement::GaussianBlur

Looks like the odd/even branches could be merged relatively easy.
Attachment #325219 - Flags: superreview?(mats.palmgren)
No sr other than me has any domain knowledge here, as far as I know. What do you think we should do?
The area where I felt my knowledge lacked is the Gaussian blur operation
as such, I just can't tell if your code is a reasonable implementation of it.

Looking at the rules for super-review it seems to allow, as an exception,
a lack of domain knowledge (third paragraph):
http://www.mozilla.org/hacking/reviewers.html
So, if jwatt has the above domain knowledge (and looking at CVS blame it
seems so) I think that would be sufficient and we're still within bounds
of the rules if I sr.  Does that seem righteous to you?
Yes, I think that's what we should do.

To be honest, whatever "domain knowledge" I have about Gaussian blurs comes from reading this code and a couple of articles in Wikipedia, so that doesn't count for much :-).

This patch is actually supposed to preserve the behaviour of the existing code. The only change in behaviour should be some approximation error in the approximate division, but hopefully the comments explain adequately why this is OK.
Attached patch fix v2Splinter Review
Addresses Mats' comments, hopefully. I didn't factor out some of the common subexpressions, because I couldn't think of a meaningful name for the variables. The compiler should be able to take care of it and they aren't in the really hot paths anyway.

I'm not sure how you feel about the SUM macros, Mats, but I hope you'll be OK with this.
Attachment #325219 - Attachment is obsolete: true
Attachment #325368 - Flags: superreview?(mats.palmgren)
Attachment #325368 - Flags: review?(jwatt)
Attachment #325219 - Flags: review?(jwatt)
Comment on attachment 325368 [details] [diff] [review]
fix v2

Macros are fine with me, but unfortunately re-defining SUM gives compile
warnings, so please add a couple of #undef, or use different names...

With that fixed, sr=mats (with my limited domain knowledge as noted in
previous comments).
Attachment #325368 - Flags: superreview?(mats.palmgren) → superreview+
Comment on attachment 325368 [details] [diff] [review]
fix v2

switching review request on jwatt's recommendation
Attachment #325368 - Flags: review?(jwatt) → review?(longsonr)
Comment on attachment 325368 [details] [diff] [review]
fix v2

undefs are needed but that's part of comment 8 already.
Attachment #325368 - Flags: review?(longsonr) → review+
checked in
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Another bug connected with gaussian blur: https://bugzilla.mozilla.org/show_bug.cgi?id=421318&
Can we get this performance test on tinderbox as part of Tgfx or whatever?
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: