Closed Bug 921495 Opened 11 years ago Closed 11 years ago

AlphaBoxBlur reduces opacity due to bad rounding

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: mstange, Assigned: mstange)

Details

Attachments

(3 files, 2 obsolete files)

Attached file testcase
In this testcase, the color values in the opaque white part of the shadow should all be #FFFFFF, but instead they are #FCFCFC. Blurring uses three passes and reduces the color values by 1 per pass due to a rounding problem.

I have patches locally that convert the SVG gaussian blur filter to use AlphaBoxBlur internally, and due to this bug, feGaussianBlur-1.svg and feGaussianBlur-2.svg currently fail.
Attached patch part 1: fix rounding (obsolete) — Splinter Review
The color values are calculated as
((uint64_t(1) << 32) / boxSize * valueSum) >> 32.
For a boxSize of 1521, for example, blurring only pixels with value 255 gives:
   ((uint64_t(1) << 32) / 1521 * (255 * 1521) >> 32
== (2823778                    * 387855     ) >> 32
== 1095216416190 >> 32
== 254                                    <-- !!
and 1095216416190 / 2.0e32 == 254.999943.

I think the right way to fix this is to use a rounding shift, or, equivalently, to add 2^31 before the shift.

This patch makes the DrawShadow200x200LargeRadius perftest slower by 4.3% on my machine, using the SSE2 code.
Attachment #811151 - Flags: review?(bas)
Attached patch part 2: regain performance (obsolete) — Splinter Review
This patch improves performance on the DrawShadow200x200LargeRadius perftest for me by 14.8% after the part1 patch.
Compared to the original code, part1 + part2 are 11.2% faster than the original code.
(Only taking part2 would make us 14.5% faster than the original code, but I'm not advocating that because I want the rounding fix.)
Attachment #811154 - Flags: review?(bas)
(In reply to Markus Stange [:mstange] from comment #1)
> and 1095216416190 / 2.0e32 == 254.999943.
                        ^ This was supposed to be a "2.0^32", of course.

Perf numbers for the Blur500x500x50 perftest (which is only on the Moz2d filters branch at the moment):

original:    544.396ms +/- 3.00381
only part1:  573.329ms +/- 9.1663   -- 5.3% slower than original
part1+part2: 473.078ms +/- 6.34367  -- 13.1% faster than original
only part2:  455.779ms +/- 5.02724  -- 16.3% faster than original
If a 5% regression is the cutoff for taking the rounding fix, we can reorder the numbers from comment 3:

original:    544.396ms +/- 3.00381
only part2:  455.779ms +/- 5.02724  -- 16.3% faster than original
part1+part2: 473.078ms +/- 6.34367  -- 3.7% slower than with speed fix

:-)
Comment on attachment 811154 [details] [diff] [review]
part 2: regain performance

Review of attachment 811154 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/BlurSSE2.cpp
@@ +33,4 @@
>  
> +MOZ_ALWAYS_INLINE
> +__m128i BlurFourPixels(__m128i aTopLeft, __m128i aTopRight,
> +                       __m128i aBottomRight, __m128i aBottomLeft,

This actually does not compile on Windows because VC does not allow passing more than three __m128i arguments on the stack. I'll change it to const __m128i&, it will get inlined anyway.
This reorders the union so that this compiles on Windows.

Note to self: If we don't take this fix, I need to make feGaussianBlur-1.svg and feGaussianBlur-2.svg fuzzy (1,12800 and 3,1600) when landing filters.
Attachment #811151 - Attachment is obsolete: true
Attachment #811151 - Flags: review?(bas)
Attachment #8334154 - Flags: review?(bas)
Attachment #811154 - Attachment is obsolete: true
Attachment #811154 - Flags: review?(bas)
Attachment #8334155 - Flags: review?(bas)
Comment on attachment 8334154 [details] [diff] [review]
part 1: fix rounding

Review of attachment 8334154 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/BlurSSE2.cpp
@@ +17,5 @@
> +  const __m128i mask = _mm_setr_epi32(0x0, 0xffffffff, 0x0, 0xffffffff);
> +  const union {
> +    int64_t i64[2];
> +    __m128i m;
> +  } roundingAddition = { { int64_t(1) << 31, int64_t(1) << 31 } };

I wonder if all compilers do this well. If you perftested this on all platforms I'm happy.
Attachment #8334154 - Flags: review?(bas) → review+
Attachment #8334155 - Flags: review?(bas) → review+
Good thing you made me check, it was slow on Windows. Making roundingAddition static fixes it, luckily.
Attachment #811151 - Flags: review?(bas)
Attachment #811154 - Flags: review?(bas)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: