Closed Bug 1116070 Opened 10 years ago Closed 10 years ago

Improve the performance of mask image computation by neon

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla37

People

(Reporter: ethlin, Assigned: ethlin)

Details

Attachments

(1 file, 5 obsolete files)

v1 - accumulation index and use float 10 years ago Ethan Lin[:ethlin] 3.94 KB, patch	longsonr : review+ jerry : feedback+	Details \| Diff \| Splinter Review
v2 - Use neon to speed up 10 years ago Ethan Lin[:ethlin] 8.46 KB, patch	longsonr : review-	Details \| Diff \| Splinter Review
v3 - Use neon to speed up 10 years ago Ethan Lin[:ethlin] 26.50 KB, patch		Details \| Diff \| Splinter Review
v4 - Use neon to speed up 10 years ago Ethan Lin[:ethlin] 12.82 KB, patch		Details \| Diff \| Splinter Review
v5 - Use neon to speed up 10 years ago Ethan Lin[:ethlin] 12.79 KB, patch	longsonr : review+	Details \| Diff \| Splinter Review
v6 - Use neon to speed up. r=longsonr 10 years ago Ethan Lin[:ethlin] 12.80 KB, patch	ethlin : review+	Details \| Diff \| Splinter Review

Ethan Lin[:ethlin]

Assignee

Description

•

10 years ago

When generating the SVG mask, we used to calculate index by pixel. There is a way to get index by accumulating. Another problem is the luminance calculation is using double type, but the decimal digits of precision is only 4. I think we can just use float type to do the calculation.

Cameron McCormack (:heycam)

Comment 1

•

10 years ago

(In reply to Ethan Lin[:Ethan] from comment #0) > When generating the SVG mask, we used to calculate index by pixel. There is > a way to get index by accumulating. Another problem is the luminance > calculation is using double type, but the decimal digits of precision is > only 4. I think we can just use float type to do the calculation. Sounds good to me. BTW if you have content that's using SVG masks and it's slow, consider using alpha masks, which should be much faster. (Ideally we'd use a mask layer to handle SVG masks, though I'm not sure whether we have plans to do that any time soon?)

Ethan Lin[:ethlin]

Assignee

Comment 2

•

10 years ago

Attached patch v1 - accumulation index and use float (obsolete) — Details — Splinter Review

There are 2 changes: 1. Calculate index by accumulation. Which can reduce the cross operations in the for loop. 2. Use float instead of double. The calculation of luminance will be faster. I tried the 480x360 mask image, the time of mask computation is from 12.5ms to 10.6ms on flame-kk.

Attachment #8542072 - Flags: feedback?(hshih)

Robert Longson [:longsonr]

Updated

•

10 years ago

Attachment #8542072 - Flags: review+

Jerry Shih[:jerry] (UTC+8) (inactive)

Updated

•

10 years ago

Attachment #8542072 - Flags: feedback?(hshih) → feedback+

Ethan Lin[:ethlin]

Assignee

Comment 3

•

10 years ago

Attached patch v2 - Use neon to speed up (obsolete) — Details — Splinter Review

There is a bug 1106909 about the scrolling performance problem on the mobile lockscreen and we found that the SVG mask uses a lot of time in each frame. In this patch, I try to use neon to speed up the mask creation. I also change the luminance calculation to uint8 computation. This is the formula: (76*R + 150*G + 29*B) >> 8 [1]. By this patch, the creation time of 480x360 in flame-kk is from the original 12.5ms to 1.5ms. [1] http://en.wikipedia.org/wiki/YUV

Attachment #8542072 - Attachment is obsolete: true

Attachment #8542888 - Flags: review?(longsonr)

Robert Longson [:longsonr]

Comment 4

•

10 years ago

Comment on attachment 8542888 [details] [diff] [review] v2 - Use neon to speed up Everything is cosmetic really. I would like to see another patch mainly because I need more time to review the NEON code. >+ int rFac = 76*aOpacity; >+ int gFac = 150*aOpacity; >+ int bFac = 29*aOpacity; needs spaces round * shouldn't use int try int32_t (multiple places) I don't mind magic numbers from the SVG specification but 76 isn't. We can afford floating point calculations outside loops so use the original constants and multiply by 255 rFax, gFac and bFac don't mean much. Don't abbreviate and try to think of better names please. >+ int offset = aStride - 4 * aSize.width; int32_t >+ int offset = aStride - 4 * aSize.width; >+ uint8_t *pixel = aData; >+ int rFac = 76*aOpacity; >+ int gFac = 150*aOpacity; >+ int bFac = 29*aOpacity; >+ ditto >+/** >+ * Byte offsets of channels in a native packed gfxColor or cairo image surface. >+ */ >+#ifdef IS_BIG_ENDIAN >+#define GFX_ARGB32_OFFSET_A 0 >+#define GFX_ARGB32_OFFSET_R 1 >+#define GFX_ARGB32_OFFSET_G 2 >+#define GFX_ARGB32_OFFSET_B 3 >+#else >+#define GFX_ARGB32_OFFSET_A 3 >+#define GFX_ARGB32_OFFSET_R 2 >+#define GFX_ARGB32_OFFSET_G 1 >+#define GFX_ARGB32_OFFSET_B 0 >+#endif Be nice if these moved to a header so there's just the one copy. Do we actually support NEON with both endiannesses though? >+ >+void >+ComputesRGBLuminanceMask_NEON(uint8_t *aData, >+ int32_t aStride, >+ const IntSize &aSize, >+ float aOpacity) >+{ >+ int rFac = 76*aOpacity; >+ int gFac = 150*aOpacity; >+ int bFac = 29*aOpacity; ditto. >+ >+ uint8_t *pixel = aData; >+ int offset = aStride - 4 * aSize.width; >+ for (int32_t y = 0; y < aSize.height; y++) { >+ for (int32_t x = 0; x < aSize.width; x+=8) { spaces round += >+ if (!pixel[GFX_ARGB32_OFFSET_A]) { >+ memset(pixel, 0, 4); >+ } >+ pixel += 4; >+ } >+ pixel += offset; >+ } >+ >+ pixel = aData; >+ int remainWidth = aSize.width & 8; Please don't use int. >+ uint16x8_t temp; >+ uint8x8_t gray; >+ uint8x8x4_t result; >+ uint8x8_t rVec = vdup_n_u8(rFac); >+ uint8x8_t gVec = vdup_n_u8(gFac); >+ uint8x8_t bVec = vdup_n_u8(bFac); >+ for (int32_t y = 0; y < aSize.height; y++) { >+ for (int32_t x = 0; x < aSize.width; x+=8) { spaces round += please. >+ uint8x8x4_t argb = vld4_u8(pixel); >+ temp = vmull_u8(argb.val[GFX_ARGB32_OFFSET_R], rVec); >+ temp = vmlal_u8(temp, argb.val[GFX_ARGB32_OFFSET_G], gVec); >+ temp = vmlal_u8(temp, argb.val[GFX_ARGB32_OFFSET_B], bVec); >+ gray = vshrn_n_u16(temp, 8); >+ result.val[0] = gray; >+ result.val[1] = gray; >+ result.val[2] = gray; >+ result.val[3] = gray; >+ vst4_u8(pixel, result); >+ pixel += 8 * 4; >+ } >+ >+ for (int32_t x = 0; x < remainWidth; ++x) { >+ pixel[0] = (rFac * pixel[GFX_ARGB32_OFFSET_R] + >+ gFac * pixel[GFX_ARGB32_OFFSET_G] + >+ bFac * pixel[GFX_ARGB32_OFFSET_B]) >> 8; >+ memset(pixel+1, pixel[0], 3); >+ pixel+=4; spaces around += please and + (line above) >diff --git a/layout/svg/nsSVGMaskFrameNEON.h b/layout/svg/nsSVGMaskFrameNEON.h >new file mode 100644 >index 0000000..74db889 >--- /dev/null >+++ b/layout/svg/nsSVGMaskFrameNEON.h >@@ -0,0 +1,20 @@ >+/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ >+/* this source code form is subject to the terms of the mozilla public >+ * license, v. 2.0. if a copy of the mpl was not distributed with this file, >+ * You can obtain one at http://mozilla.org/MPL/2.0/. */ >+ >+#ifndef __NS_SVGMASKFRAMENEON_H__ >+#define __NS_SVGMASKFRAMENEON_H__ >+ >+#include "mozilla/gfx/2D.h" >+ >+using namespace mozilla; >+using namespace mozilla::gfx; Do you need both (or in fact either) of these lines?

Attachment #8542888 - Flags: review?(longsonr) → review-

Ethan Lin[:ethlin]

Assignee

Comment 5

•

10 years ago

Attached patch v3 - Use neon to speed up (obsolete) — Details — Splinter Review

I fix the coding style and some try server errors. I think neon should support both endiannesses because we also use neon in other places.

Attachment #8543716 - Flags: review?(longsonr)

Ethan Lin[:ethlin]

Assignee

Updated

•

10 years ago

Attachment #8542888 - Attachment is obsolete: true

Ethan Lin[:ethlin]

Assignee

Comment 6

•

10 years ago

Attached patch v4 - Use neon to speed up (obsolete) — Details — Splinter Review

Last patch contained wrong file. This is the correct one. Fix the coding style and some try server errors.

Attachment #8543716 - Attachment is obsolete: true

Attachment #8543716 - Flags: review?(longsonr)

Attachment #8543820 - Flags: review?(longsonr)

Ethan Lin[:ethlin]

Assignee

Comment 7

•

10 years ago

Comment on attachment 8543820 [details] [diff] [review] v4 - Use neon to speed up Review of attachment 8543820 [details] [diff] [review]: ----------------------------------------------------------------- ::: layout/reftests/svg/svg-integration/reftest.list @@ +26,5 @@ > == filter-html-zoomed-01.xhtml filter-html-01-ref.svg > == mask-html-01.xhtml mask-html-01-ref.svg > == mask-html-01-extref-01.xhtml mask-html-01-ref.svg > random == mask-html-01-extref-02.xhtml mask-html-01-ref.svg # random due to bug 877661 > +fuzzy-if(B2G&&browserIsRemote,1,2300) == mask-html-zoomed-01.xhtml mask-html-01-ref.svg I change the fuzzy tolerant number from 2000 to 2300 of the test case 'mask-html-zoomed-01.xhtml'. That is because the number of different pixels before calculating luminance is 2250, and the original luminance calculation has some error from float to integer. After the float conversion, the number of different pixels become less than 2000 coincidentally. With the new luminance calculation, there is no error of float conversion. So the number become 2250. That is why I have to change the tolerant number.

Robert Longson [:longsonr]

Comment 8

•

10 years ago

Comment on attachment 8543820 [details] [diff] [review] v4 - Use neon to speed up + luminance = pixel[0] = (redFactor * pixel[GFX_ARGB32_OFFSET_R] + + greenFactor * pixel[GFX_ARGB32_OFFSET_G] + + blueFactor * pixel[GFX_ARGB32_OFFSET_B]) >> 8; what's the pixel[0] doing? Seems like that could be removed, no? >+ pixel = aData; >+ int32_t remainWidth = aSize.width % 8; Call this remainderWidth (or widthRemainder) as that's clearer. I really don't like abbreviating variables. >+ int32_t neonWidth = aSize.width - remainWidth; roundedWidth perhaps? neonWidth doesn't really give much away as to what it is. >+ uint16x8_t temp; >+ uint8x8_t gray; >+ uint8x8x4_t result; >+ uint8x8_t redVec = vdup_n_u8(redFactor); >+ uint8x8_t greenVec = vdup_n_u8(greenFactor); >+ uint8x8_t blueVec = vdup_n_u8(blueFactor); >+ for (int32_t y = 0; y < aSize.height; y++) { >+ // Calculate luminance by neon with 8 pixels per loop >+ for (int32_t x = 0; x < neonWidth; x += 8) { >+ uint8x8x4_t argb = vld4_u8(pixel); >+ temp = vmull_u8(argb.val[GFX_ARGB32_OFFSET_R], redVec); // temp = red * redFactor >+ temp = vmlal_u8(temp, argb.val[GFX_ARGB32_OFFSET_G], greenVec); // temp += green * greenFactor >+ temp = vmlal_u8(temp, argb.val[GFX_ARGB32_OFFSET_B], blueVec); // temp += blue * blueFactor >+ gray = vshrn_n_u16(temp, 8); // gray = temp >> 8 >+ >+ // Put the result to the 8 pixels >+ result.val[0] = gray; >+ result.val[1] = gray; >+ result.val[2] = gray; >+ result.val[3] = gray; >+ vst4_u8(pixel, result); >+ pixel += 8 * 4; What if pixel[GFX_ARGB32_OFFSET_A] == 0? You set all the right values in the first place (in code I've elided) but then we overwrite them here with the wrong values don't you. >+ } >+ >+ // Calculate the reset pixels of the line by cpu s/reset/rest/ ? >+ for (int32_t x = 0; x < remainWidth; x++) { >+ pixel[0] = (redFactor * pixel[GFX_ARGB32_OFFSET_R] + >+ greenFactor * pixel[GFX_ARGB32_OFFSET_G] + >+ blueFactor * pixel[GFX_ARGB32_OFFSET_B]) >> 8; >+ memset(pixel+1, pixel[0], 3); spaces round + please? Same as above, what if pixel[GFX_ARGB32_OFFSET_A] == 0? >+ pixel += 4; >+ } >+ pixel += offset; >+ } >+} >+ >diff --git a/layout/svg/nsSVGMaskFrameNEON.h b/layout/svg/nsSVGMaskFrameNEON.h >new file mode 100644 >index 0000000..6ce7a0a >--- /dev/null >+++ b/layout/svg/nsSVGMaskFrameNEON.h >@@ -0,0 +1,20 @@ >+/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ >+/* this source code form is subject to the terms of the mozilla public >+ * license, v. 2.0. if a copy of the mpl was not distributed with this file, >+ * You can obtain one at http://mozilla.org/MPL/2.0/. */ >+ >+#ifndef __NS_SVGMASKFRAMENEON_H__ >+#define __NS_SVGMASKFRAMENEON_H__ >+ >+#include "mozilla/gfx/2D.h" >+ >+//using namespace mozilla; Nit: No commented out code please, remove the above line.

Ethan Lin[:ethlin]

Assignee

Comment 9

•

10 years ago

Attached patch v5 - Use neon to speed up (obsolete) — Details — Splinter Review

This patch fix the above coding style problems. About the case of alpha with zero value, I think the result should be correct. The first loop sets all channel to zero when alpha is zero, so the following luminance calculation will still get the result as zero. That's something like a pre-process for neon to prevent the alpha check when calculating 8 pixels at a time.

Attachment #8543820 - Attachment is obsolete: true

Attachment #8543820 - Flags: review?(longsonr)

Attachment #8544322 - Flags: review?(longsonr)

Robert Longson [:longsonr]

Comment 10

•

10 years ago

I'll look at this later btw have you seen bug 1025928, especially bug 1025928 comment 3. Up to you whether you combine that idea or take that bug or ignore this comment altogether :-)

Robert Longson [:longsonr]

Comment 11

•

10 years ago

Another totally followup idea would be, could we create an alpha only surface for mask? 1/4 the memory and data to write if it works of course.

Robert Longson [:longsonr]

Updated

•

10 years ago

Attachment #8544322 - Flags: review?(longsonr) → review+

Robert Longson [:longsonr]

Comment 12

•

10 years ago

Don't forget to add r=longsonr on the checkin comment :-)

Ethan Lin[:ethlin]

Assignee

Comment 13

•

10 years ago

(In reply to Robert Longson from comment #11) > Another totally followup idea would be, could we create an alpha only > surface for mask? 1/4 the memory and data to write if it works of course. Right, I tried this method yesterday. I used a A8 format surface to store the mask result and also only set alpha channel(only has this channel now) in luminance calculation. For 480x360 image with neon, this will improve about 2.5ms when applying mask. I think we will have more benefits with cpu computation. I will add this in next patch. Another idea is using cache, and this is what I am trying. If the mask has no change, we could just use the last one. Do you think it's a correct direction? I am still studying how to make sure there is no change of the mask.

Flags: needinfo?(longsonr)

Robert Longson [:longsonr]

Comment 14

•

10 years ago

Caching sounds difficult. How would you track changes in the mask's content? SMIL changes would be hard to track I think (use elements suffer from that problem already). The mask may also be applied to many elements, each of different size requiring different bitmap sizes.

Flags: needinfo?(longsonr)

Ethan Lin[:ethlin]

Assignee

Comment 15

•

10 years ago

Attached patch v6 - Use neon to speed up. r=longsonr — Details — Splinter Review

Add reviewer to comment. I will create another bug for further improvement.

Attachment #8544322 - Attachment is obsolete: true

Attachment #8544508 - Flags: review+

Ethan Lin[:ethlin]

Assignee

Updated

•

10 years ago

Summary: Improve the performance of mask image computation → Improve the performance of mask image computation by neon

Ethan Lin[:ethlin]

Assignee

Comment 16

•

10 years ago

Please land the attachment 8544508 [details] [diff] [review] to mozilla-central. Try server result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=905816c4135b https://treeherder.mozilla.org/#/jobs?repo=try&revision=3aab33939fc2

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 17

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a437d18a4b9b

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 18

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a437d18a4b9b

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla37

You need to log in before you can comment on or make changes to this bug.