Closed
Bug 439375
Opened 16 years ago
Closed 16 years ago
Improve feGaussianBlur inner loop
Categories
(Core :: SVG, defect)
Core
SVG
Tracking
()
RESOLVED
FIXED
People
(Reporter: roc, Assigned: roc)
Details
Attachments
(2 files, 2 obsolete files)
2.30 KB,
application/xml
|
Details | |
13.97 KB,
patch
|
longsonr
:
review+
MatsPalmgren_bugz
:
superreview+
|
Details | Diff | Splinter Review |
Bug 399488 made us use a lookup table to speed up the integer division by boxSize that's important in the boxblur inner loop. With a little bit more work we can do better without a lookup table. The attached testcase does the following: -- backs out 399488, replacing the table lookup by approximating "v/boxSize" with "(v*K) >> 24" where K is chosen appropriately -- Unifies BoxBlurH/BoxBlurV into a single BoxBlur static function -- Does three passes over the same row or column of pixels before moving on to the next row or column (better memory locality) -- For the common case where the blur area is bigger than the boxSize, removes conditional branches from the inner loop by dividing the loop space into three segments: the segment where the left/top edge of the box is to the left/above the blur region; the segment where the box is entirely inside the blur region; and the segment where the right/bottom edge of the box is the right/below the blur region. (Mats, I'm asking you for sr because Tim is mostly out of action, Vlad and Stuart are on holiday, so we don't really have anyone capable of doing sr's for SVG. If you don't want to deal with this, let me know and I'll find someone else.)
Attachment #325217 -
Flags: superreview?(mats.palmgren)
Attachment #325217 -
Flags: review?(jwatt)
Assignee | ||
Comment 1•16 years ago
|
||
This is the testcase I used for performance analysis. When I produced this patch a while ago, I got these results on my Mac: Trunk: 94646ms With patch: 84084ms I've got some major filter restructuring and optimizations in my branch that get this down to 26896ms :-) ... I'll submit those after this patch has landed, I'm trying to avoid unnecessary conflicts.
Assignee | ||
Comment 2•16 years ago
|
||
Sorry, that was the wrong version of the patch. This is the right version.
Attachment #325217 -
Attachment is obsolete: true
Attachment #325219 -
Flags: superreview?(mats.palmgren)
Attachment #325219 -
Flags: review?(jwatt)
Attachment #325217 -
Flags: superreview?(mats.palmgren)
Attachment #325217 -
Flags: review?(jwatt)
Comment 3•16 years ago
|
||
Comment on attachment 325219 [details] [diff] [review] right patch I don't have the domain knowledge to feel comfortable doing the superreview here, sorry. All I can offer are a few nits: + * So we can try approxmating the division V/N as V*K/(2^24) (integer approximating + * division, 32-bit mutiply). Dividing by 2^24 is a simple shift so it's multiply +BoxBlur(PRUint8 *aInput, PRUint8 *aOutput, + PRInt32 aStrideMinor,PRInt32 aStartMinor, PRInt32 aEndMinor, + PRUint32 leftLobe, PRUint32 rightLobe) Missing space after comma. Not your fault, but I'd like consistent naming: aLeftLobe, aRightLobe. Using 'const' would make the code more readable too. + for (PRUint32 i=0; i < boxSize; i++) { + PRInt32 pos = aStartMinor - leftLobe + i; aStartMinor - leftLobe is a loop invariant. + for (PRInt32 j=0; j < 4; j++) { + sums[j] += aInput[aStrideMinor*pos + j]; } aStrideMinor*pos is a (inner) loop invariant. A matter of taste perhaps, but I find the old code (using four separate assignments instead of a loop) more readable. + for (PRInt32 j=0; j < 4; j++) { + sums[j] += aInput[aStrideMinor*next + j] - + aInput[aStrideMinor*last + j]; + } Loop invariants. And loop can be replaced with four assignments. > nsSVGFEGaussianBlurElement::GaussianBlur Looks like the odd/even branches could be merged relatively easy.
Attachment #325219 -
Flags: superreview?(mats.palmgren)
Assignee | ||
Comment 4•16 years ago
|
||
No sr other than me has any domain knowledge here, as far as I know. What do you think we should do?
Comment 5•16 years ago
|
||
The area where I felt my knowledge lacked is the Gaussian blur operation as such, I just can't tell if your code is a reasonable implementation of it. Looking at the rules for super-review it seems to allow, as an exception, a lack of domain knowledge (third paragraph): http://www.mozilla.org/hacking/reviewers.html So, if jwatt has the above domain knowledge (and looking at CVS blame it seems so) I think that would be sufficient and we're still within bounds of the rules if I sr. Does that seem righteous to you?
Assignee | ||
Comment 6•16 years ago
|
||
Yes, I think that's what we should do. To be honest, whatever "domain knowledge" I have about Gaussian blurs comes from reading this code and a couple of articles in Wikipedia, so that doesn't count for much :-). This patch is actually supposed to preserve the behaviour of the existing code. The only change in behaviour should be some approximation error in the approximate division, but hopefully the comments explain adequately why this is OK.
Assignee | ||
Comment 7•16 years ago
|
||
Addresses Mats' comments, hopefully. I didn't factor out some of the common subexpressions, because I couldn't think of a meaningful name for the variables. The compiler should be able to take care of it and they aren't in the really hot paths anyway. I'm not sure how you feel about the SUM macros, Mats, but I hope you'll be OK with this.
Attachment #325219 -
Attachment is obsolete: true
Attachment #325368 -
Flags: superreview?(mats.palmgren)
Attachment #325368 -
Flags: review?(jwatt)
Attachment #325219 -
Flags: review?(jwatt)
Comment 8•16 years ago
|
||
Comment on attachment 325368 [details] [diff] [review] fix v2 Macros are fine with me, but unfortunately re-defining SUM gives compile warnings, so please add a couple of #undef, or use different names... With that fixed, sr=mats (with my limited domain knowledge as noted in previous comments).
Attachment #325368 -
Flags: superreview?(mats.palmgren) → superreview+
Assignee | ||
Comment 9•16 years ago
|
||
Comment on attachment 325368 [details] [diff] [review] fix v2 switching review request on jwatt's recommendation
Attachment #325368 -
Flags: review?(jwatt) → review?(longsonr)
Comment 10•16 years ago
|
||
Comment on attachment 325368 [details] [diff] [review] fix v2 undefs are needed but that's part of comment 8 already.
Attachment #325368 -
Flags: review?(longsonr) → review+
Assignee | ||
Comment 11•16 years ago
|
||
checked in
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 12•16 years ago
|
||
Another bug connected with gaussian blur: https://bugzilla.mozilla.org/show_bug.cgi?id=421318&
Comment 13•16 years ago
|
||
Can we get this performance test on tinderbox as part of Tgfx or whatever?
Flags: in-testsuite?
You need to log in
before you can comment on or make changes to this bug.
Description
•