Enable SSE optimization in the in-tree libpng

RESOLVED FIXED in Firefox 51

Status

()

enhancement
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: glennrp+bmo, Assigned: glennrp+bmo)

Tracking

Trunk
mozilla51
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49 wontfix, firefox51 fixed)

Details

Attachments

(2 attachments, 15 obsolete attachments)

14.12 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
8.67 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
Libpng-1.6.22 offers optimized support for Intel-SSE platforms.  It's not enabled by default.
Assignee: nobody → glennrp+bmo
Status: NEW → ASSIGNED
Depends on: 1275901
In bug # 1275901 (update embedded libpng to version 1.6.23) comment #14 Seth commented:

I did want to bring to your attention that for now, at least, we are still supporting machines that have SSE but not SSE2. I just wanted to make sure you've thought about this, as we've seen some nasty issues related to running SSE2 code on SSE-only machines in the past.
Please submit all 4 parts together to try.
Flags: needinfo?(ryanvm)
Seth wrote in comment #1
> I did want to bring to your attention that for now, at least, we are still
> supporting machines that have SSE but not SSE2. I just wanted to make sure
> you've thought about this, as we've seen some nasty issues related to
> running SSE2 code on SSE-only machines in the past.

There is code in pngpriv.h that tries to sort out the various levels of SSE support.
applying v00-1276127-part01-add-sse-directory-to-intree-libpng.diff
unable to find 'media/libpng/sse2/intel_init.c' for patching
Flags: needinfo?(ryanvm)
diff against /dev/null instead of existing media/libpng/sse2 directory
Please retry all parts together
Attachment #8775784 - Attachment is obsolete: true
Flags: needinfo?(ryanvm)
Simplified SSE and ARM selection in old-configure.in
Attachment #8775787 - Attachment is obsolete: true
Further simplify SSE/ARM selection in old-configure.in
Attachment #8776415 - Attachment is obsolete: true
Conditionally compile sse2 headers
Attachment #8775785 - Attachment is obsolete: true
https://treeherder.mozilla.org/#/jobs?repo=try&revision=efa537a529fb

Sorry for the extra commits in that patch queue. They're not going to hurt anything.
Flags: needinfo?(ryanvm)
Comment on attachment 8776423 [details] [diff] [review]
v00-1276127-part06-sse-changes

The commit message calls this part 5
Attachment #8776423 - Flags: feedback-
Fixed typo in part06 checkin message
Attachment #8776423 - Flags: feedback-
Attachment #8776423 - Attachment is obsolete: true
Comment on attachment 8775998 [details] [diff] [review]
v01-1276127-part01-add-sse-directory-to-intree-libpng

Try is all green and logs show SSE2 support was built.
Attachment #8775998 - Flags: review?(tnikkel)
Attachment #8776439 - Flags: review?(tnikkel)
Attachment #8775786 - Flags: review?(tnikkel)
Attachment #8776422 - Flags: review?(tnikkel)
Attachment #8776218 - Flags: review?(tnikkel)
Attachment #8776721 - Flags: review?(tnikkel)
Comment on attachment 8775786 [details] [diff] [review]
v00-1276127-part03-sse-mozbuild

Redirecting to Jeff.
Attachment #8775786 - Flags: review?(tnikkel) → review?(jmuizelaar)
Attachment #8775998 - Flags: review?(tnikkel) → review?(jmuizelaar)
Attachment #8776218 - Flags: review?(tnikkel) → review?(jmuizelaar)
Attachment #8776422 - Flags: review?(tnikkel) → review?(jmuizelaar)
Attachment #8776439 - Flags: review?(tnikkel) → review?(jmuizelaar)
Attachment #8776721 - Flags: review?(tnikkel) → review?(jmuizelaar)
Rebased.
Attachment #8776422 - Attachment is obsolete: true
Attachment #8776422 - Flags: review?(jmuizelaar)
Split from part04
Attachment #8777869 - Flags: review?(jmuizelaar)
Attachment #8777870 - Flags: review?(jmuizelaar)
Comment on attachment 8775786 [details] [diff] [review]
v00-1276127-part03-sse-mozbuild

Review of attachment 8775786 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/libpng/moz.build
@@ +37,5 @@
>      SOURCES += [
>          'arm/filter_neon.S'
>      ]
>  
> +if CONFIG['MOZ_PNG_INTEL_SSE']:

I think this can be:
if CONFIG['INTEL_ARCHITECTURE']:

I wasn't able to find out what's supposed to set MOZ_PNG_INTEL_SSE
Attachment #8775786 - Flags: review?(jmuizelaar) → review-
(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> Comment on attachment 8775786 [details] [diff] [review]
> v00-1276127-part03-sse-mozbuild
> 
> Review of attachment 8775786 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: media/libpng/moz.build
> @@ +37,5 @@
> >      SOURCES += [
> >          'arm/filter_neon.S'
> >      ]
> >  
> > +if CONFIG['MOZ_PNG_INTEL_SSE']:
> 
> I think this can be:
> if CONFIG['INTEL_ARCHITECTURE']:
> 
> I wasn't able to find out what's supposed to set MOZ_PNG_INTEL_SSE

It's set in old-configure.in (patch part 04) and can be disabled
with  --enable-png-intel-sse-support=(yes/no)
Move --enable-png-sse-support (and --enable-png-arm-neon-support) test outside the "case CPU_ARCH" test so user can override the result.  So:
PNG_INTEL_SSE_SUPPORT is 0 by default, then 1 on x86 or x86-64, and finally 0 or 1 if --enable/--disable is present.
Attachment #8777869 - Attachment is obsolete: true
Attachment #8777869 - Flags: review?(jmuizelaar)
(In reply to Glenn Randers-Pehrson from comment #23)
> Created attachment 8778023 [details] [diff] [review]
> v03-1276127-part04-sse-old.configure
> 
> Move --enable-png-sse-support (and --enable-png-arm-neon-support) test
> outside the "case CPU_ARCH" test so user can override the result.  So:
> PNG_INTEL_SSE_SUPPORT is 0 by default, then 1 on x86 or x86-64, and finally
> 0 or 1 if --enable/--disable is present.

Is there any reason we would want to disable SSE support?
Flags: needinfo?(glennrp+bmo)
We might want to allow users to disable SSE support for debugging or performance testing purposes.
Flags: needinfo?(glennrp+bmo)
(In reply to Glenn Randers-Pehrson from comment #25)
> We might want to allow users to disable SSE support for debugging or
> performance testing purposes.

There's no precedent for this in other libraries that we support. I think being able to to disable it from the moz.build files is probably sufficient.

Also, do you mind posting the performance results from these changes?
There's precedent in the way ARM-NEON support has been handled by the PNG decoder, but I can easily get rid of the enable/disable feature for both ARM-NEON and INTEL-SSE2.  I don't have performance results for Firefox but I've tested with an instrumented version of pngcrush, but I've got to rerun those with libpng-1.6.24 now.
Rebased old-configure.in
Attachment #8778023 - Attachment is obsolete: true
(In reply to Jeff Muizelaar [:jrmuizel] from comment #26)
> Also, do you mind posting the performance results from these changes?

Here are some.  There's a bigger effect with PAETH-filtered images, and
not much effect on typical icons:

pngcrush_183_uxxx Not SSE2 Optimized
pngcrush_183_usxx SSE2 Optimized
HRT means "High resolution timing", i.e., clock_gettime().

Linux 4.4.0-31-generic #50-Ubuntu
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609

Timing tests for firefox linux themes (10 times *.png)
pngcrush_183_uxxx: HRT time decode 0.2474, other 0.0739, total 0.3300 sec
pngcrush_183_usxx: HRT time decode 0.2444, other 0.0715, total 0.3245 sec

Timing tests for firefox branding (20 times *.png)
pngcrush_183_uxxx: HRT time decode 0.2300, other 0.0707, total 0.3096 sec
pngcrush_183_usxx: HRT time decode 0.2098, other 0.0545, total 0.2731 sec

Timing tests for W-rgb-M147.png (100000x159, PAETH filtered)
pngcrush_183_uxxx: HRT time decode 0.5648, other 0.1914, total 0.7563 sec
pngcrush_183_usxx: HRT time decode 0.4401, other 0.1927, total 0.6328 sec

Timing tests for W-rgba-M147.png (100000x159, PAETH filtered)
pngcrush_183_uxxx: HRT time decode 0.7682, other 0.2513, total 1.0195 sec
pngcrush_183_usxx: HRT time decode 0.5496, other 0.2524, total 0.8020 sec

SunOS 5.10 Generic_147441-13 i86pc i386 i86pc
gcc (GCC) 4.7.1

Timing tests for firefox linux themes (10 times *.png)
pngcrush_183_uxxx: HRT time decode 0.2910, other 0.0434, total 0.3451 sec
pngcrush_183_usxx: HRT time decode 0.2848, other 0.0437, total 0.3391 sec

Timing tests for firefox branding (20 times *.png)
pngcrush_183_uxxx: HRT time decode 0.2499, other 0.0367, total 0.2985 sec
pngcrush_183_usxx: HRT time decode 0.2467, other 0.0364, total 0.2948 sec

Timing tests for W-rgb-M147.png (100000x159, PAETH filtered)
pngcrush_183_uxxx: HRT time decode 0.7805, other 0.1473, total 0.9279 sec
pngcrush_183_usxx: HRT time decode 0.7699, other 0.1456, total 0.9156 sec

Timing tests for W-rgba-M147.png (100000x159, PAETH filtered)
pngcrush_183_uxxx: HRT time decode 1.0327, other 0.1962, total 1.2290 sec
pngcrush_183_usxx: HRT time decode 0.8255, other 0.1940, total 1.0196 sec
(In reply to Glenn Randers-Pehrson from comment #27)
> There's precedent in the way ARM-NEON support has been handled by the PNG
> decoder, but I can easily get rid of the enable/disable feature for both
> ARM-NEON and INTEL-SSE2.

Yeah, let's not add it for INTEL-SSE2. You can drop the ARM-NEON stuff in another bug. The patches mostly look good. In the next version of patch please only separate out the parts that are actually independent. e.g there's no point in splitting out the licensing changes from the other ones. I expect the whole thing probably makes the most sense as a single patch.
Update to libpng-1.6.24
Attachment #8775998 - Attachment is obsolete: true
Attachment #8775998 - Flags: review?(jmuizelaar)
Attachment #8780163 - Flags: review?(jmuizelaar)
Combined patches, update to libpng-1.6.24, remove configuration and use INTEL_ARCHITECTURE instead.

Need another try run since so much is changed.
Attachment #8775786 - Attachment is obsolete: true
Attachment #8776218 - Attachment is obsolete: true
Attachment #8776439 - Attachment is obsolete: true
Attachment #8776721 - Attachment is obsolete: true
Attachment #8777870 - Attachment is obsolete: true
Attachment #8778671 - Attachment is obsolete: true
Attachment #8776218 - Flags: review?(jmuizelaar)
Attachment #8776439 - Flags: review?(jmuizelaar)
Attachment #8776721 - Flags: review?(jmuizelaar)
Attachment #8777870 - Flags: review?(jmuizelaar)
Flags: needinfo?(ryanvm)
Attachment #8780164 - Flags: review?(jmuizelaar)
Comment on attachment 8780164 [details] [diff] [review]
v00-1276127-part02-sse-intree-libpng

Review of attachment 8780164 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8780164 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8780163 [details] [diff] [review]
v02-1276127-part01-add-sse-directory-to-intree-libpng

Review of attachment 8780163 [details] [diff] [review]:
-----------------------------------------------------------------

It would be fine to have joined both of these patches as one. Please use a single commit.
Attachment #8780163 - Flags: review?(jmuizelaar) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdd3e16aebd4
Add SSE support to in-tree libpng. r=jrmuizel
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cdd3e16aebd4
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Blocks: 1296946
This seems to have had no discernible effect on arewefastyet.com in mid-August (I don't know if AWFY tests png decoding or not, but it doesn't look that way).
arewefastyet only measures JS performance
You need to log in before you can comment on or make changes to this bug.