Closed
Bug 753010
Opened 13 years ago
Closed 10 years ago
Add a NEON optimized blur method
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla38
People
(Reporter: jrmuizel, Assigned: ethlin)
References
(Depends on 1 open bug)
Details
Attachments
(2 files)
16.65 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
6.60 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
This would be handy to have.
Reporter | ||
Comment 1•13 years ago
|
||
See bug 509052 for an old SSE2 version.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → etlin
Assignee | ||
Comment 2•10 years ago
|
||
Add neon functions for blur to speed up performance. The method is similar with SSE version.
Attachment #8551072 -
Flags: feedback?(hshih)
Assignee | ||
Updated•10 years ago
|
Attachment #8551072 -
Flags: feedback?(hshih) → review?(mstange)
Comment 3•10 years ago
|
||
I'm amazed by how identical this looks to my patch in bug 1045865. The only differences seem to be in the last few lines of BlurNEON.cpp, and in the location of the call to vqmovn_u32 (which I've moved into Divide).
Comment 5•10 years ago
|
||
Oh, and our GenerateIntegralImage_NEON implementation is completely different.
Comment 6•10 years ago
|
||
Comment on attachment 8551072 [details] [diff] [review]
Part 1 - Add neon method for blur operation
Review of attachment 8551072 [details] [diff] [review]:
-----------------------------------------------------------------
This looks great. I reviewed it by comparing it to my implementation; my suggestions below are basically just those differences where I preferred my version.
::: gfx/2d/BlurNEON.cpp
@@ +13,5 @@
> +uint32x4_t Divide(uint32x4_t aValues, uint32x4_t aDivisor)
> +{
> + uint64x2_t roundingAddition = vdupq_n_u64(int64_t(1) << 31);
> + uint64x2_t multiplied21 = vmull_u32(vget_low_u32(aValues), vget_low_u32(aDivisor));
> + uint64x2_t multiplied43 = vmull_u32(vget_high_u32(aValues), vget_high_u32(aDivisor));
vget_low_u32(aDivisor) is always the same as vget_high_u32(aDivisor), isn't it? Would it make sense to pass aDivisor as uint32x2_t instead?
@@ +201,5 @@
> + GenerateIntegralImage_NEON(leftInflation, aRightLobe, aTopLobe, aBottomLobe,
> + aIntegralImage, aIntegralImageStride, aData,
> + mStride, size);
> +
> + uint32x4_t divisor = vdupq_n_u32(reciprocal);
Right, so this can be uint32x2_t and vdupq_n_u32.
@@ +257,5 @@
> + bottomLeft = vld1q_u32(bottomLeftBase + x + 12);
> + uint32x4_t result4 = BlurFourPixels(topLeft, topRight, bottomRight, bottomLeft, divisor);
> +
> + uint8x8_t combine1 = vqmovn_u16(vcombine_u16(vqmovn_u32(result1), vqmovn_u32(result2)));
> + uint8x8_t combine2 = vqmovn_u16(vcombine_u16(vqmovn_u32(result3), vqmovn_u32(result4)));
Instead of calling vqmovn_u32 here every time, just move it into Divide.
@@ +281,5 @@
> + uint8x8_t final = vqmovn_u16(vcombine_u16(vqmovn_u32(result), vdup_n_u16(0)));
> + vst1_lane_u8(data + stride * y + x , final, 0);
> + vst1_lane_u8(data + stride * y + x + 1, final, 1);
> + vst1_lane_u8(data + stride * y + x + 2, final, 2);
> + vst1_lane_u8(data + stride * y + x + 3, final, 3);
So my patch did this instead:
uint32x2_t final = vreinterpret_u32_u8(vmovn_u16(vcombine_u16(result, vdup_n_u16(0))));
*(uint32_t*)(data + stride * y + x) = vget_lane_u32(final, 0);
I don't think I've tested whether that works. Do you think it would work? Would it be faster?
Attachment #8551072 -
Flags: review?(mstange) → review+
Comment 7•10 years ago
|
||
Comment on attachment 8551072 [details] [diff] [review]
Part 1 - Add neon method for blur operation
Review of attachment 8551072 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/2d/BlurNEON.cpp
@@ +21,5 @@
> +
> +MOZ_ALWAYS_INLINE
> +uint32x4_t BlurFourPixels(const uint32x4_t& aTopLeft, const uint32x4_t& aTopRight,
> + const uint32x4_t& aBottomRight, const uint32x4_t& aBottomLeft,
> + const uint32x4_t& aDivisor)
indent
Assignee | ||
Comment 8•10 years ago
|
||
Thanks for the recommendations. I tested the performance of the last part and your changes are correct and faster.
Attachment #8553468 -
Flags: review?(mstange)
Comment 9•10 years ago
|
||
Comment on attachment 8553468 [details] [diff] [review]
Part 2 - Refactor some neon functions
thanks!
Attachment #8553468 -
Flags: review?(mstange) → review+
Assignee | ||
Comment 10•10 years ago
|
||
Please land the attachment 8551072 [details] [diff] [review] and attachment 8553468 [details] [diff] [review] to mozilla-central.
Try server result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20e689296bc2
Keywords: checkin-needed
Comment 11•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/afd9fa40a02e
https://hg.mozilla.org/integration/mozilla-inbound/rev/09f774177683
Keywords: checkin-needed
Comment 12•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/afd9fa40a02e
https://hg.mozilla.org/mozilla-central/rev/09f774177683
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in
before you can comment on or make changes to this bug.
Description
•