Speed up JPEG decoding by another 10%

VERIFIED FIXED in mozilla1.9beta3

Status

()

Core
ImageLib
P1
normal
VERIFIED FIXED
9 years ago
8 years ago

People

(Reporter: Alfred Kayser, Assigned: Alfred Kayser)

Tracking

({perf})

Trunk
mozilla1.9beta3
Points:
---
Bug Flags:
blocking1.9 +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Assignee)

Description

9 years ago
Created attachment 297517 [details] [diff] [review]
Prototype 

Currently the JPEB library convert the source color space to RGB (3bytes/pixel), after which the JPEG decoder then converts this to the packed Cairo format (4bytes/pixel).  Much more efficient would be to generate directly into the Cairo format.

This could be done in jpeglib, but for some systems, the 'system jpeglib' is used, and that won't work. Easier is to have the JPEG decoder after the jpeglib as initialized itself, override the color_convert function with one that can convert from the source color space directly into Cairo. As the most common (99.99% of the JPEG's out there) color space is YCbCr, we only need to provide a conversion routine for that into packed RGB.

This only requires a small change in the JPEG decoder.

Net effect is that in a simple 'jpegtest.html' test, about 10% speeddifference is noted (from avg. 66ms to 56ms). 

Also, maybe a SSE2 guru can optimize this routine (based on the material done on the YCbCr->RGB and RGB to PackedRGB)?

Note, many thanks to mmoy for the static tables, which I have shamefully copied into this patch.
Attachment #297517 - Flags: review?(pavlov)

Comment 1

9 years ago
This seems like a pretty neat solution.

tmp0 = _mm_unpacklo_epi8(row_R, row_G);    /* RGRG RGRG RGRG RGRG */
tmp1 = _mm_unpacklo_epi8(row_B, zeroes);   /* B0B0 B0B0 B0B0 B0B0 */
tmp2 = _mm_unpacklo_epi16(tmp0, tmp1);     /* RGB0 RGB0 RGB0 RGB0 (low)  */
tmp3 = _mm_unpackhi_epi16(tmp0, tmp1);     /* RGB0 RGB0 RGB0 RGB0 (high) */

This code takes the output of the math processing which is three color vectors
and merges them back into four-byte pixels. The order can be changed by switching row_R with row_B and using a vector with all bits on instead of all zeroes. Then the next 23 lines of code (almost half of the SIMD routine) which convert four-byte pixels to three-byte pixels can then be tossed. I'll try to do this tonight.

Also, it would be nice to have only one copy of the static tables.

Comment 2

9 years ago
The new code would look like this. It's like playing with Legos.

tmp0 = _mm_unpacklo_epi8(ones, row_B);    /* ABAB ABAB ABAB ABAB */
tmp1 = _mm_unpacklo_epi8(row_G, row_R);   /* GRGR GRGR GRGR GRGR */
tmp2 = _mm_unpacklo_epi16(tmp0, tmp1);    /* ABGR ABGR ABGR ABGR (low)  */
tmp3 = _mm_unpackhi_epi16(tmp0, tmp1);    /* ABGR ABGR ABGR ABGR (high) */

(Assignee)

Comment 3

9 years ago
Interesting: this could really speed things up.

One set of tables would be nice, but where to put them? libjpeg could be a 'system library' so coming from somewhere without these tables.
Version: unspecified → Trunk

Comment 4

9 years ago
Moving to the blocking list - we'd love to get this in before Beta3 folks!

Flags: blocking1.9+
Priority: -- → P1

Comment 5

9 years ago
Perhaps making it global in nsJPEGDecoder? That way, the Mozilla version would have access to it.

It sounds like Mozilla's jpeg code in /jpeg isn't compiled and linked in when libjpeg is used which means that the detection code for SSE2 capability has to be moved for nsJPEGDecoder to use it. Do you think that it's reasonable to put it in nsJPEGDecoder?

Comment 6

9 years ago
I found a 17% improvement on Windows from Alfred's two patches. This bug and the input buffer bug.

Comment 7

9 years ago
Comment on attachment 297517 [details] [diff] [review]
Prototype 

ok, this patch needs to not have the patch from 411379 in it.  lets get 411718 in first, then 411379, then this.

Comment 8

9 years ago
(In reply to comment #7)
> (From update of attachment 297517 [details] [diff] [review])
> ok, this patch needs to not have the patch from 411379 in it.  lets get 411718
> in first, then 411379, then this.  
> 

Alfred you want to make a new patch without the changes in 411379 since that already landed?
(Assignee)

Comment 9

9 years ago
Created attachment 299712 [details] [diff] [review]
V2: Updated to current trunk, and removed mUseCairo

Note:
The 'ARGB' converter is only used in the most common case: in colorspace = JCS_YCbCr, and out colorspace = RGB, and no CMS transformations applied.
Attachment #297517 - Attachment is obsolete: true
Attachment #299712 - Flags: review?
Attachment #297517 - Flags: review?(pavlov)
(Assignee)

Updated

9 years ago
Attachment #299712 - Flags: review? → review?(pavlov)
(Assignee)

Comment 10

9 years ago
Created attachment 299945 [details] [diff] [review]
Real cvs diff of V2
Attachment #299712 - Attachment is obsolete: true
Attachment #299945 - Flags: review?(pavlov)
Attachment #299712 - Flags: review?(pavlov)

Comment 11

9 years ago
Comment on attachment 299945 [details] [diff] [review]
Real cvs diff of V2

+/**************** YCbCr -> Ciaro's RGB24/ARGB32 conversion: most common case **************/

should be just cairo and not Ciaro ;)
(Assignee)

Comment 12

9 years ago
Created attachment 300081 [details] [diff] [review]
V2: spellingfix: Ciaro -> Cairo
Attachment #299945 - Attachment is obsolete: true
Attachment #300081 - Flags: review?(pavlov)
Attachment #299945 - Flags: review?(pavlov)

Updated

9 years ago
Attachment #300081 - Flags: review?(pavlov) → review+
Keywords: checkin-needed
Checking in modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp,v  <--  nsJPEGDecoder.cpp
new revision: 1.88; previous revision: 1.87
done
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta3
This broke --with-system-jpeg for me.
So if --with-system-jpeg is not supported anymore it needs to be removed from configure or if it is with newer jpeg versions the configure check needs to be updated IMHO.

Comment 15

9 years ago
we need to support building with the system jpeg library.  we should probably not do this if we're not building with in-tree.
(Assignee)

Comment 16

9 years ago
Can someone tell me how this breaks --with-system-jpeg?
Could it be that
  #include "jpegint.h" 
doesn't work, as jpegint.h for system-jpeg builds is not available?

The basic idea was that this should also work with system-jpeg enabled.
Alfred, yes, missing jpegint.h is the build error.
(Assignee)

Comment 18

9 years ago
Created attachment 300344 [details] [diff] [review]
Replace include "jpegint.h" with some locally copied definitions...

Just copy the definitions we need to override the deconverter.
Note this is a safe thing to do, as libjpeg hasn't changed for years now.

Can this be tested if this helps to build with system jpeg?

Comment 19

9 years ago
So none of the code from the jpeg library is available with --with-system-jpeg?
(Assignee)

Comment 20

9 years ago
with --with-system-jpeg, the library is dynamically linked in from a system location, so the compiled code is available in that sense.
But any extensions and/or changes that are made to the code in the mozilla/jpeg directory are not available for those that build --with-system-jpeg.
(In reply to comment #18)
> Created an attachment (id=300344) [details]
> Can this be tested if this helps to build with system jpeg?

My clobber build just got past this, --with-system-jpeg

(Assignee)

Comment 22

9 years ago
Comment on attachment 300344 [details] [diff] [review]
Replace include "jpegint.h" with some locally copied definitions...

Stuart, can you review this followup patch to get the system-jpeg builds compiled? Or it is better to create a new bug for this?
Attachment #300344 - Flags: review?(pavlov)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

9 years ago
Attachment #300344 - Flags: review?(pavlov) → review+
Attachment #300344 - Flags: approval1.9b3?
Comment on attachment 300344 [details] [diff] [review]
Replace include "jpegint.h" with some locally copied definitions...

a=beltzner for beta3, safe fix that moves a few defines
Attachment #300344 - Flags: approval1.9b3? → approval1.9b3+
Checking in modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp,v  <--  nsJPEGDecoder.cpp
new revision: 1.90; previous revision: 1.89
done
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
(Assignee)

Updated

8 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.