Closed Bug 753010 Opened 13 years ago Closed 10 years ago

Add a NEON optimized blur method

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla38

People

(Reporter: jrmuizel, Assigned: ethlin)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

Part 1 - Add neon method for blur operation 10 years ago Ethan Lin[:ethlin] 16.65 KB, patch	mstange : review+	Details \| Diff \| Splinter Review
Part 2 - Refactor some neon functions 10 years ago Ethan Lin[:ethlin] 6.60 KB, patch	mstange : review+	Details \| Diff \| Splinter Review

Jeff Muizelaar [:jrmuizel]

Reporter

Description

•

13 years ago

This would be handy to have.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

13 years ago

Blocks: 752029

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 1

•

13 years ago

See bug 509052 for an old SSE2 version.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

13 years ago

Depends on: 758825

Ethan Lin[:ethlin]

Assignee

Updated

•

10 years ago

Assignee: nobody → etlin

Ethan Lin[:ethlin]

Assignee

Comment 2

•

10 years ago

Attached patch Part 1 - Add neon method for blur operation — Details — Splinter Review

Add neon functions for blur to speed up performance. The method is similar with SSE version.

Attachment #8551072 - Flags: feedback?(hshih)

Ethan Lin[:ethlin]

Assignee

Updated

•

10 years ago

Attachment #8551072 - Flags: feedback?(hshih) → review?(mstange)

Markus Stange [:mstange]

Comment 3

•

10 years ago

I'm amazed by how identical this looks to my patch in bug 1045865. The only differences seem to be in the last few lines of BlurNEON.cpp, and in the location of the call to vqmovn_u32 (which I've moved into Divide).

Markus Stange [:mstange]

Comment 5

•

10 years ago

Oh, and our GenerateIntegralImage_NEON implementation is completely different.

Markus Stange [:mstange]

Comment 6

•

10 years ago

Comment on attachment 8551072 [details] [diff] [review] Part 1 - Add neon method for blur operation Review of attachment 8551072 [details] [diff] [review]: ----------------------------------------------------------------- This looks great. I reviewed it by comparing it to my implementation; my suggestions below are basically just those differences where I preferred my version. ::: gfx/2d/BlurNEON.cpp @@ +13,5 @@ > +uint32x4_t Divide(uint32x4_t aValues, uint32x4_t aDivisor) > +{ > + uint64x2_t roundingAddition = vdupq_n_u64(int64_t(1) << 31); > + uint64x2_t multiplied21 = vmull_u32(vget_low_u32(aValues), vget_low_u32(aDivisor)); > + uint64x2_t multiplied43 = vmull_u32(vget_high_u32(aValues), vget_high_u32(aDivisor)); vget_low_u32(aDivisor) is always the same as vget_high_u32(aDivisor), isn't it? Would it make sense to pass aDivisor as uint32x2_t instead? @@ +201,5 @@ > + GenerateIntegralImage_NEON(leftInflation, aRightLobe, aTopLobe, aBottomLobe, > + aIntegralImage, aIntegralImageStride, aData, > + mStride, size); > + > + uint32x4_t divisor = vdupq_n_u32(reciprocal); Right, so this can be uint32x2_t and vdupq_n_u32. @@ +257,5 @@ > + bottomLeft = vld1q_u32(bottomLeftBase + x + 12); > + uint32x4_t result4 = BlurFourPixels(topLeft, topRight, bottomRight, bottomLeft, divisor); > + > + uint8x8_t combine1 = vqmovn_u16(vcombine_u16(vqmovn_u32(result1), vqmovn_u32(result2))); > + uint8x8_t combine2 = vqmovn_u16(vcombine_u16(vqmovn_u32(result3), vqmovn_u32(result4))); Instead of calling vqmovn_u32 here every time, just move it into Divide. @@ +281,5 @@ > + uint8x8_t final = vqmovn_u16(vcombine_u16(vqmovn_u32(result), vdup_n_u16(0))); > + vst1_lane_u8(data + stride * y + x , final, 0); > + vst1_lane_u8(data + stride * y + x + 1, final, 1); > + vst1_lane_u8(data + stride * y + x + 2, final, 2); > + vst1_lane_u8(data + stride * y + x + 3, final, 3); So my patch did this instead: uint32x2_t final = vreinterpret_u32_u8(vmovn_u16(vcombine_u16(result, vdup_n_u16(0)))); *(uint32_t*)(data + stride * y + x) = vget_lane_u32(final, 0); I don't think I've tested whether that works. Do you think it would work? Would it be faster?

Attachment #8551072 - Flags: review?(mstange) → review+

Markus Stange [:mstange]

Comment 7

•

10 years ago

Comment on attachment 8551072 [details] [diff] [review] Part 1 - Add neon method for blur operation Review of attachment 8551072 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/BlurNEON.cpp @@ +21,5 @@ > + > +MOZ_ALWAYS_INLINE > +uint32x4_t BlurFourPixels(const uint32x4_t& aTopLeft, const uint32x4_t& aTopRight, > + const uint32x4_t& aBottomRight, const uint32x4_t& aBottomLeft, > + const uint32x4_t& aDivisor) indent

Ethan Lin[:ethlin]

Assignee

Comment 8

•

10 years ago

Attached patch Part 2 - Refactor some neon functions — Details — Splinter Review

Thanks for the recommendations. I tested the performance of the last part and your changes are correct and faster.

Attachment #8553468 - Flags: review?(mstange)

Markus Stange [:mstange]

Comment 9

•

10 years ago

Comment on attachment 8553468 [details] [diff] [review] Part 2 - Refactor some neon functions thanks!

Attachment #8553468 - Flags: review?(mstange) → review+

Ethan Lin[:ethlin]

Assignee

Comment 10

•

10 years ago

Please land the attachment 8551072 [details] [diff] [review] and attachment 8553468 [details] [diff] [review] to mozilla-central. Try server result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20e689296bc2

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 11

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/afd9fa40a02e https://hg.mozilla.org/integration/mozilla-inbound/rev/09f774177683

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 12

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/afd9fa40a02e https://hg.mozilla.org/mozilla-central/rev/09f774177683

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

Peter Chang[:pchang]

Updated

•

10 years ago

Blocks: gfxperf

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add a NEON optimized blur method

Categories

(Core :: Graphics, defect)

Tracking

()

People

(Reporter: jrmuizel, Assigned: ethlin)

References

(Depends on 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Updated

Updated

Comment 2

Updated

Comment 3

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type