Open Bug 462796 Opened 16 years ago Updated 2 years ago

Add ARM optimizations to image decoders

Categories

(Core :: Graphics: ImageLib, defect)

ARM
Maemo
defect

Tracking

()

People

(Reporter: pavlov, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

It would be great if we could do ARM optimized JPEG and PNG decoding, similar to the MMX and SSE2 code we have for them now.
Android has arm optimizations for libjpeg, but they appear to be Apache licensed. We should talk to google about getting them to relicense under the libjpeg license.
Depends on: 489148
Stuart et al., we're seeing a pretty significant perf improvement with libjpeg-turbo on x86 (bug 573948)-- it looks like the hot path is running about 2.5x faster on all platforms.  I imagine you'll see something similar with a good NEON library.
Depends on: 655693
libpng recently added some ARM assembler code.

Latest libpng 1.5.9 just landed in Firefox, so it's a good time to try it.

To test it, you would need to define PNG_ARM_NEON in mozpngconf.h 
and include arm/filter_neon.S file from libpng sources.
Hi all,

   You can try external/libpng and external/zlib fast neon patch, provided by Code Aurora Forum. You can find the code in FFOS unagi device. These patch can improve 15% png decode performance.
(In reply to james.zhang from comment #4)
> Hi all,
> 
>    You can try external/libpng and external/zlib fast neon patch, provided
> by Code Aurora Forum. You can find the code in FFOS unagi device. These
> patch can improve 15% png decode performance.

Can you provide a link to these patches?
> Can you provide a link to these patches?

I think he's referring to code which lives in the B2G tree under the "external" directory.

The git repositories are

  git://codeaurora.org/platform/external/zlib and
  git://codeaurora.org/platform/external/libpng

I don't know if there are web interfaces to those repositories.  They both appear to have a bit of NEON code in them (git grep neon).
libpng and zlib neon patch
Attached patch png_read_filter_row_neon.S (obsolete) — Splinter Review
add png_read_filter_row_neon.S
add inflate_fast_copy_neon.S
Please reference the attachment, we have verified the patch on android.
Glenn, DRC:

Have you seen these patches before (from CodeAurora)? Would you like to roll them upstream?
Oh, sorry - I thought there was a libjpeg part of this, but it's libpng only.
(In reply to Joe Drew (:JOEDREW! \o/) from comment #11)
> Glenn, DRC:
> 
> Have you seen these patches before (from CodeAurora)? Would you like to roll
> them upstream?

See comment #3.  This implementation looks to be equivalent to the one in libpng's arm directory, but not copied from it or based upon it.  I can't tell by looking which is better, although this one is much more commented.
unagi device also has libjpeg neon patch. I'll provide the patch on B2G later.
git://codeaurora.org/platform/external/jpeg
(In reply to james.zhang from comment #14)
> unagi device also has libjpeg neon patch. I'll provide the patch on B2G
> later.
> git://codeaurora.org/platform/external/jpeg

Note that we use libjpeg-turbo, which has extensive NEON optimizations.

I'd be surprised if the CA code is faster than libjpeg-turbo, although if it is, I imagine DRC would be interested.
(In reply to Justin Lebar [:jlebar] from comment #15)
> (In reply to james.zhang from comment #14)
> > unagi device also has libjpeg neon patch. I'll provide the patch on B2G
> > later.
> > git://codeaurora.org/platform/external/jpeg
> 
> Note that we use libjpeg-turbo, which has extensive NEON optimizations.
> 
> I'd be surprised if the CA code is faster than libjpeg-turbo, although if it
> is, I imagine DRC would be interested.

I think the CA code optimization function is different from libjpeg-turbo, so they can have both effect.
(In reply to Justin Lebar [:jlebar] from comment #15)
> (In reply to james.zhang from comment #14)
> > unagi device also has libjpeg neon patch. I'll provide the patch on B2G
> > later.
> > git://codeaurora.org/platform/external/jpeg
> 
> Note that we use libjpeg-turbo, which has extensive NEON optimizations.
> 
> I'd be surprised if the CA code is faster than libjpeg-turbo, although if it
> is, I imagine DRC would be interested.
Sorry, libjpeg-turbo and the CA code optimize the same fuctions. I'll compare their performance and choose the better one.
Depends on: 832390
James, do you have any performance numbers to report?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #18)
> James, do you have any performance numbers to report?

About 15% performance improvement in png decode.
Is the 15% perf. improvement taken from Comment 4 or is it based on a new benchmark you did because of Jeff's question in Comment 18?
(In reply to Jorge Quiñónez from comment #20)
> Is the 15% perf. improvement taken from Comment 4 or is it based on a new
> benchmark you did because of Jeff's question in Comment 18?

Taken from Commnet 4, we verify this patch on Android Antutu benchmark, and test big png image decode. We did these benchmark last year.
It would be good to know how much of the performance improvement was due to the libpng patch and how much was due to the zlib patch.  In the usual case where PNG scanline filters are all NONE, the libpng patch would provide no improvement.
Depends on: 841734
Comment on attachment 690722 [details] [diff] [review]
png_read_filter_row_neon.S

This patch was made obsolete by checkin of libpng-1.5.17, bug #886499.
Attachment #690722 - Attachment is obsolete: true
Hardware: x86 → ARM
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: