Closed Bug 412753 Opened 17 years ago Closed 17 years ago

Speed up JPEG decoding by another 10%

Categories

(Core :: Graphics: ImageLib, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta3

People

(Reporter: alfredkayser, Assigned: alfredkayser)

Details

(Keywords: perf)

Attachments

(2 files, 3 obsolete files)

Attached patch Prototype (obsolete) — Splinter Review
Currently the JPEB library convert the source color space to RGB (3bytes/pixel), after which the JPEG decoder then converts this to the packed Cairo format (4bytes/pixel).  Much more efficient would be to generate directly into the Cairo format.

This could be done in jpeglib, but for some systems, the 'system jpeglib' is used, and that won't work. Easier is to have the JPEG decoder after the jpeglib as initialized itself, override the color_convert function with one that can convert from the source color space directly into Cairo. As the most common (99.99% of the JPEG's out there) color space is YCbCr, we only need to provide a conversion routine for that into packed RGB.

This only requires a small change in the JPEG decoder.

Net effect is that in a simple 'jpegtest.html' test, about 10% speeddifference is noted (from avg. 66ms to 56ms). 

Also, maybe a SSE2 guru can optimize this routine (based on the material done on the YCbCr->RGB and RGB to PackedRGB)?

Note, many thanks to mmoy for the static tables, which I have shamefully copied into this patch.
Attachment #297517 - Flags: review?(pavlov)
This seems like a pretty neat solution.

tmp0 = _mm_unpacklo_epi8(row_R, row_G);    /* RGRG RGRG RGRG RGRG */
tmp1 = _mm_unpacklo_epi8(row_B, zeroes);   /* B0B0 B0B0 B0B0 B0B0 */
tmp2 = _mm_unpacklo_epi16(tmp0, tmp1);     /* RGB0 RGB0 RGB0 RGB0 (low)  */
tmp3 = _mm_unpackhi_epi16(tmp0, tmp1);     /* RGB0 RGB0 RGB0 RGB0 (high) */

This code takes the output of the math processing which is three color vectors
and merges them back into four-byte pixels. The order can be changed by switching row_R with row_B and using a vector with all bits on instead of all zeroes. Then the next 23 lines of code (almost half of the SIMD routine) which convert four-byte pixels to three-byte pixels can then be tossed. I'll try to do this tonight.

Also, it would be nice to have only one copy of the static tables.
The new code would look like this. It's like playing with Legos.

tmp0 = _mm_unpacklo_epi8(ones, row_B);    /* ABAB ABAB ABAB ABAB */
tmp1 = _mm_unpacklo_epi8(row_G, row_R);   /* GRGR GRGR GRGR GRGR */
tmp2 = _mm_unpacklo_epi16(tmp0, tmp1);    /* ABGR ABGR ABGR ABGR (low)  */
tmp3 = _mm_unpackhi_epi16(tmp0, tmp1);    /* ABGR ABGR ABGR ABGR (high) */

Interesting: this could really speed things up.

One set of tables would be nice, but where to put them? libjpeg could be a 'system library' so coming from somewhere without these tables.
Version: unspecified → Trunk
Moving to the blocking list - we'd love to get this in before Beta3 folks!

Flags: blocking1.9+
Priority: -- → P1
Perhaps making it global in nsJPEGDecoder? That way, the Mozilla version would have access to it.

It sounds like Mozilla's jpeg code in /jpeg isn't compiled and linked in when libjpeg is used which means that the detection code for SSE2 capability has to be moved for nsJPEGDecoder to use it. Do you think that it's reasonable to put it in nsJPEGDecoder?
I found a 17% improvement on Windows from Alfred's two patches. This bug and the input buffer bug.
Comment on attachment 297517 [details] [diff] [review]
Prototype 

ok, this patch needs to not have the patch from 411379 in it.  lets get 411718 in first, then 411379, then this.
(In reply to comment #7)
> (From update of attachment 297517 [details] [diff] [review])
> ok, this patch needs to not have the patch from 411379 in it.  lets get 411718
> in first, then 411379, then this.  
> 

Alfred you want to make a new patch without the changes in 411379 since that already landed?
Note:
The 'ARGB' converter is only used in the most common case: in colorspace = JCS_YCbCr, and out colorspace = RGB, and no CMS transformations applied.
Attachment #297517 - Attachment is obsolete: true
Attachment #299712 - Flags: review?
Attachment #297517 - Flags: review?(pavlov)
Attachment #299712 - Flags: review? → review?(pavlov)
Attached patch Real cvs diff of V2 (obsolete) — Splinter Review
Attachment #299712 - Attachment is obsolete: true
Attachment #299945 - Flags: review?(pavlov)
Attachment #299712 - Flags: review?(pavlov)
Comment on attachment 299945 [details] [diff] [review]
Real cvs diff of V2

+/**************** YCbCr -> Ciaro's RGB24/ARGB32 conversion: most common case **************/

should be just cairo and not Ciaro ;)
Attachment #299945 - Attachment is obsolete: true
Attachment #300081 - Flags: review?(pavlov)
Attachment #299945 - Flags: review?(pavlov)
Attachment #300081 - Flags: review?(pavlov) → review+
Keywords: checkin-needed
Checking in modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp,v  <--  nsJPEGDecoder.cpp
new revision: 1.88; previous revision: 1.87
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta3
This broke --with-system-jpeg for me.
So if --with-system-jpeg is not supported anymore it needs to be removed from configure or if it is with newer jpeg versions the configure check needs to be updated IMHO.
we need to support building with the system jpeg library.  we should probably not do this if we're not building with in-tree.
Can someone tell me how this breaks --with-system-jpeg?
Could it be that
  #include "jpegint.h" 
doesn't work, as jpegint.h for system-jpeg builds is not available?

The basic idea was that this should also work with system-jpeg enabled.
Alfred, yes, missing jpegint.h is the build error.
Just copy the definitions we need to override the deconverter.
Note this is a safe thing to do, as libjpeg hasn't changed for years now.

Can this be tested if this helps to build with system jpeg?
So none of the code from the jpeg library is available with --with-system-jpeg?
with --with-system-jpeg, the library is dynamically linked in from a system location, so the compiled code is available in that sense.
But any extensions and/or changes that are made to the code in the mozilla/jpeg directory are not available for those that build --with-system-jpeg.
(In reply to comment #18)
> Created an attachment (id=300344) [details]
> Can this be tested if this helps to build with system jpeg?

My clobber build just got past this, --with-system-jpeg

Comment on attachment 300344 [details] [diff] [review]
Replace include "jpegint.h" with some locally copied definitions...

Stuart, can you review this followup patch to get the system-jpeg builds compiled? Or it is better to create a new bug for this?
Attachment #300344 - Flags: review?(pavlov)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #300344 - Flags: review?(pavlov) → review+
Attachment #300344 - Flags: approval1.9b3?
Comment on attachment 300344 [details] [diff] [review]
Replace include "jpegint.h" with some locally copied definitions...

a=beltzner for beta3, safe fix that moves a few defines
Attachment #300344 - Flags: approval1.9b3? → approval1.9b3+
Checking in modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp,v  <--  nsJPEGDecoder.cpp
new revision: 1.90; previous revision: 1.89
done
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: