416157 - Add JPEG SSE2 color processing for ycc_rgb_convert_argb

Assignee

Description

•

18 years ago

Add SSE2 processing for this routine for the Mac OSX Intel and Windows (32-bit) platforms. I've tested the code on Mac OSX and need to test it on Windows.

Michael Moy

Assignee

Comment 1

•

18 years ago

Attached patch First pass at the patch (obsolete) — Details — Splinter Review

Works on Mac OSX. I need to test it on Windows.

Michael Moy

Assignee

Comment 2

•

18 years ago

Attached patch SSE2 JPEG color optimization (obsolete) — Details — Splinter Review

Added a few Windows fixes.

Attachment #301932 - Attachment is obsolete: true

jag (Peter Annema)

Comment 3

•

18 years ago

To make the code a bit easier to read, you could put something like this in gfxPlatform.h: #if defined(GFX_HAVE_SSE2_MNEMONICS) extern int SSE2Present; #elif defined(GFX_HAVE_SSE2_INTRINSICS) #define SSE2Present 1 #endif That would let you get rid of a whole bunch of guards. I would add a comment along the lines of "// always true for GFX_HAVE_SSE2_INTRINCICS" on the |== 1| test. Also, you could add: #if defined(GFX_HAVE_SSE2_MNEMONICS) && defined(GFX_HAVE_SSE2_INTRINSICS) #error Can't have both GFX_HAVE_SSE2_MNEMONICS and GFX_HAVE_SSE2_INTRINSICS #endif In gfxPlatform.cpp, shouldn't the |int SSE2Present = 0;| be outside the guard for #include <intrin.h>?

Michael Moy

Assignee

Comment 4

•

18 years ago

Attached patch Color patch with the SSE2 IFAST intrinsics optimizations for OSX (obsolete) — Details — Splinter Review

This patch adds the SSE2 IFAST optimization for Mac/OSX. I will write some comments on the code later this morning. It doesn't include the comments posted a little while ago.

Attachment #302057 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 5

•

18 years ago

I removed SSE2Present in the 2008-02-11 04:35 PST attachment and did some cleanup in my development environment. I also added your suggestion regarding generating the error message. On your first suggested change in gfxPlatform.h, how do you use this without having the extern int included in gfxPlatform.cpp where it will be defined without an extern? (In reply to comment #3) > To make the code a bit easier to read, you could put something like this in > gfxPlatform.h: > > #if defined(GFX_HAVE_SSE2_MNEMONICS) > extern int SSE2Present; > #elif defined(GFX_HAVE_SSE2_INTRINSICS) > #define SSE2Present 1 > #endif > > That would let you get rid of a whole bunch of guards. I would add a comment > along the lines of "// always true for GFX_HAVE_SSE2_INTRINCICS" on the |== 1| > test. > > Also, you could add: > > #if defined(GFX_HAVE_SSE2_MNEMONICS) && defined(GFX_HAVE_SSE2_INTRINSICS) > #error Can't have both GFX_HAVE_SSE2_MNEMONICS and GFX_HAVE_SSE2_INTRINSICS > #endif > > In gfxPlatform.cpp, shouldn't the |int SSE2Present = 0;| be outside the guard > for #include <intrin.h>? >

Michael Moy

Assignee

Comment 6

•

18 years ago

I found a way to get around the self-include problem.

jag (Peter Annema)

Comment 7

•

18 years ago

(In reply to comment #5) > On your first suggested change in gfxPlatform.h, how do you use this without > having the extern int included in gfxPlatform.cpp where it will be defined > without an extern? Ah, good point. So I guess for _INTRINSICS you could #define SSE2Present 1 in gfxPlatform.h, and for _MNEMONICS you put extern int SSE2Present in nsJPEGDecoder.cpp like you're currently doing. Or you could put the #define in the .cpp file too if you don't want to pollute gfxPlatform.h. With that, you can get rid of the _MNEMONICS guards around |if (SSE2Present == 1) {| ... |} else {| ... |}|. The compiler should optimize out the test and else case for _INTRINSICS and it will make your code a bit easier to read. I would also make |__attribute__ ((aligned (16)))| a conditional #define. It's just easier to read and maintain this (with a better name?): JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU] SSE2_ALIGNED; #ifdef D_MULTISCAN_FILES_SUPPORTED /* In multi-pass modes, we need a virtual block array for each component. */ jvirt_barray_ptr whole_image[MAX_COMPONENTS] SSE2_ALIGNED; #endif ! D_MULTISCAN_FILES_SUPPORTED Index: jpeg/jddctmgr.c =================================================================== @@ -142,31 +152,23 @@ start_pass (j_decompress_ptr cinfo) -#ifdef HAVE_SSE2_INTEL_MNEMONICS - if (SSE2Available==1) Why are you taking out the case for HAVE_SSE2_INTEL_MNEMONICS? I could be wrong, but on Windows this change will cause us to not use SSE2 anymore. If you can use the #define SSE2Available 1 trick for MacOSX, I think we'd want something like this: #if defined(HAVE_SSE2_INTRINSICS) || defined(HAVE_SSE2_INTEL_MNEMONICS) if (SSE2Available == 1) { method_ptr = jpeg_idct_ifast_sse2; method = JDCT_IFAST; } else #else { method_ptr = jpeg_idct_ifast; method = JDCT_IFAST; } #endif Hrm, that file contains tabs. I don't know what our policy is for mozilla/jpeg/ but if you don't mind, could you expand those tabs (to two spaces each, it seems) for the code you touch? + buffer_shift = (PRUint32) + (*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE, + D_MAX_BLOCKS_IN_MCU * SIZEOF(JBLOCK) + 16); + buffer_shift = ((buffer_shift + 16) >> 4) << 4; + buffer = (JBLOCKROW) buffer_shift; I think you meant + 15 there (so if you're already aligned nothing changes). And use & ~15 here too. Too bad we don't have NS_PTR_ALIGN(p, a). Also, if you use PRUptrdiff for both of these you should be 64-bit safe. If it turns out you did want to keep _MNEMONICS above, make sure to put it back here. Index: jpeg/jdcoefct.c =================================================================== @@ -718,19 +730,32 @@ jinit_d_coef_controller (j_decompress_pt + compptr->dct_table = (void *) ((buffer_pointer + 16) & ~15); Same + 15 here. Index: modules/libpr0n/decoders/jpeg/nsJPEGDecoder.cpp =================================================================== @@ -512,17 +520,21 @@ nsresult nsJPEGDecoder::ProcessData(cons +#ifdef GFX_HAVE_SSE2_INTRINSICS + mInfo.dct_method = JDCT_IFAST; +#else mInfo.dct_method = JDCT_ISLOW; +#endif ! GFX_HAVE_SSE2_INTRINSICS Is the jpeg_idct_ifast_sse2 function as accurate as the jpeg_idct_islow one? Or are we ok with a loss of a bit of accuracy here? To nit-pick some more, could you make GFX_HAVE_SSE2_MNEMONICS be ..._INTEL_MNEMONICS, or change the one in jpeg to drop the _INTEL?

jag (Peter Annema)

Comment 8

•

18 years ago

(In reply to comment #6) > I found a way to get around the self-include problem. How did you solve it?

Michael Moy

Assignee

Comment 9

•

18 years ago

(In reply to comment #8) > (In reply to comment #6) > > I found a way to get around the self-include problem. > > How did you solve it? I was going to put a #define in the .cpp file before the #include gfxPlatform.h to indicate that this is the source and not the extern so that the .h file.

Michael Moy

Assignee

Comment 10

•

17 years ago

> I would also make |__attribute__ ((aligned (16)))| a conditional > #define. It's just easier to read and maintain this (with a better > name?): I'll take a look at putting in a #define for alignment. It's somewhat of a sticky problem as the formats are different for MSVC and GCC but we're not worrying about MSVC for now. > Why are you taking out the case for HAVE_SSE2_INTEL_MNEMONICS? I could > be wrong, but on Windows this change will cause us to not use SSE2 > anymore. If you can use the #define SSE2Available 1 trick for MacOSX, > I think we'd want something like this: I think that code branch was for IFAST processing. On Windows, we currently use ISLOW processing which goes to the SSE2 or scalar code depending on the processor capabilities. Just a note that Intel provided IFAST MMX support for MSVC using inline assembler. It's never been used in Firefox to my knowledge but the code is there. It's faster than the ISLOW SSE2 code on Pentium 4s and AMD K8s as they mostly implement SSE2 instructions as pairs of 64-bit instructions. > Hrm, that file contains tabs. I don't know what our policy is for > mozilla/jpeg/ but if you don't mind, could you expand those tabs (to > two spaces each, it seems) for the code you touch? This was a sticky issue for the previous change and the reviewers gave me a pass on untabifying the whole modules. I'll try to use spaces for my code though. > I think you meant + 15 there (so if you're already aligned nothing > changes). And use & ~15 here too. I had 15 in there for a long time and changed it to 16 recently but I don't recall why. After thinking about it, 15 should work. I think that GCC on the Mac aligns memory allocations to 16 bytes but that probably isn't true for Windows and Linux and there may be some non-library memory allocation routines. > Also, if you use PRUptrdiff for both of these you should be 64-bit > safe. If it turns out you did want to keep _MNEMONICS above, make > sure to put it back here. I'll look into that. I wasn't aware of that type. > Is the jpeg_idct_ifast_sse2 function as accurate as the > jpeg_idct_islow one? Or are we ok with a loss of a bit of accuracy > here? JPEG provides for three approaches: IFAST, ISLOW and FLOAT. It's pretty hard to get good performance from FLOAT on x86 processors as you can only do four operations at a time and float operations are generally slower than integer operations. With integer operations 8 operations can be done at a time. The scalar ISLOW code operates in 32-bit integer arithmetic. The vector ISLOW code operates in 16-bit integer arithmetic so that it can fit an eight by eight matrix row in a vector. This code has been used for some time. The IFAST code uses a faster, less accurate integer implementation and there's some information in the header of jidctfst.c. The output of these algorithms is eight-byte integers so I don't think that using 16-bit integer arithmetic is much of a problem. The code here (and in inline assembler form) has been used for about two years in Win32, Windows x64 and Mac OSX builds and I haven't received any idct-related complaints.

jag (Peter Annema)

Comment 11

•

17 years ago

(In reply to comment #10) > > I would also make |__attribute__ ((aligned (16)))| a conditional > > #define. It's just easier to read and maintain this (with a better > > name?): > > I'll take a look at putting in a #define for alignment. It's somewhat of a > sticky problem as the formats are different for MSVC and GCC but we're not > worrying about MSVC for now. As long as they go in the same place, making this a define would make your life even simpler. If they go in between the return type and the function name, or before the return type, you could always wrap the relevant part in a macro, e.g. SSE2_ALIGNED(JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU]) or jvirt_barray_ptr SSE2_ALIGNED(whole_image[MAX_COMPONENTS]) which can then place the annotation the the right place. > > Why are you taking out the case for HAVE_SSE2_INTEL_MNEMONICS? I could > > be wrong, but on Windows this change will cause us to not use SSE2 > > anymore. If you can use the #define SSE2Available 1 trick for MacOSX, > > I think we'd want something like this: > > I think that code branch was for IFAST processing. On Windows, we > currently use ISLOW processing which goes to the SSE2 or scalar code > depending on the processor capabilities. Just a note that Intel > provided IFAST MMX support for MSVC using inline assembler. It's never > been used in Firefox to my knowledge but the code is there. It's > faster than the ISLOW SSE2 code on Pentium 4s and AMD K8s as they > mostly implement SSE2 instructions as pairs of 64-bit instructions. Right, so you'd want to use the IFAST processing on Windows, but since on Windows _INTRINSICS isn't defined you'll fall back to the generic code. Am I missing something here, or is that what you mean to do? > > I think you meant + 15 there (so if you're already aligned nothing > > changes). And use & ~15 here too. > > I had 15 in there for a long time and changed it to 16 recently but I > don't recall why. After thinking about it, 15 should work. I think > that GCC on the Mac aligns memory allocations to 16 bytes but that > probably isn't true for Windows and Linux and there may be some > non-library memory allocation routines. It sounds like you're talking about the + 16 for the allocation, which, btw, can also be changed to + 15, but I meant the + 16 for the pointer: + buffer_shift = (PRUint32) + (*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE, + D_MAX_BLOCKS_IN_MCU * SIZEOF(JBLOCK) + 0xF); + buffer_shift = (buffer_shift + 0xF) & ~0xF; See also what we do at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/lib/ds/plarena.h&rev=3.6&root=/cvsroot#99 though I think the _MASK + 1 in the init call at line 103 really only needs to be _MASK. > > Is the jpeg_idct_ifast_sse2 function as accurate as the > > jpeg_idct_islow one? Or are we ok with a loss of a bit of accuracy > > here? > > JPEG provides for three approaches: IFAST, ISLOW and FLOAT. It's > pretty hard to get good performance from FLOAT on x86 processors as > you can only do four operations at a time and float operations are > generally slower than integer operations. With integer operations > 8 operations can be done at a time. > > The scalar ISLOW code operates in 32-bit integer arithmetic. The > vector ISLOW code operates in 16-bit integer arithmetic so that it can > fit an eight by eight matrix row in a vector. This code has been used > for some time. The IFAST code uses a faster, less accurate integer > implementation and there's some information in the header of > jidctfst.c. The output of these algorithms is eight-byte integers so I > don't think that using 16-bit integer arithmetic is much of a problem. > > The code here (and in inline assembler form) has been used for about > two years in Win32, Windows x64 and Mac OSX builds and I haven't > received any idct-related complaints. Ah, that's good to know. jpeglib.h explicitly mentions that JDCT_IFAST is faster but less accurate than JDCT_ISLOW, which I assume they do for a reason. So, since nsJPEGDecoder picked JDCT_ISLOW, what was their reasoning? Maybe they were just playing it safe? Note that I am relatively ignorant on the specifics of jpeg decoding so I'm asking this more for my edification than as a critique on the change.

jag (Peter Annema)

Comment 12

•

17 years ago

Btw, see bug 416906 about using ! after #endif. /* ... */ is the way to go (or just leave it off where it's trivial to connect the #endif with the #ifdef/#if).

Michael Moy

Assignee

Comment 13

•

17 years ago

I modified the checking to #define SSE2Present = 1 if intrinsics are there, to 0 if they aren't so that the SSE2 code doesn't get generated if there MNEMONICS and INTRINSICS aren't set. This got rid of several #ifdefs in nsJPEGDecoder.cpp. I'm still working on the other stuff.

Michael Moy

Assignee

Comment 14

•

17 years ago

Attached patch Color patch with the SSE2 IFAST intrinsics optimizations for OSX (obsolete) — Details — Splinter Review

This is the latest version of the patch, tested on Win32 and Mac/OSX. I will do a writeup on the changes in the form of responses to the review comments later tonight.

Attachment #302573 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 15

•

17 years ago

(Comment #7 jag (Peter Annema) > I would also make |__attribute__ ((aligned (16)))| a conditional #define. > It's just easier to read and maintain this (with a better name?): > JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU] SSE2_ALIGNED; > #ifdef D_MULTISCAN_FILES_SUPPORTED > /* In multi-pass modes, we need a virtual block array for each component. */ > jvirt_barray_ptr whole_image[MAX_COMPONENTS] SSE2_ALIGNED; > #endif ! D_MULTISCAN_FILES_SUPPORTED I used the SSE2_ALIGNED() form. For GFX, it handles MSVC and GCC. For JPEG, it just handles GCC for now. > Hrm, that file contains tabs. I don't know what our policy is for mozilla/jpeg/ > but if you don't mind, could you expand those tabs (to two spaces each, it > seems) for the code you touch? I took the tabs out of the changes for this patch. > + buffer_shift = (PRUint32) > + (*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE, > + D_MAX_BLOCKS_IN_MCU * SIZEOF(JBLOCK) + 16); > + buffer_shift = ((buffer_shift + 16) >> 4) << 4; > + buffer = (JBLOCKROW) buffer_shift; > I think you meant + 15 there (so if you're already aligned nothing changes). > And use & ~15 here too. > Too bad we don't have NS_PTR_ALIGN(p, a). > Also, if you use PRUptrdiff for both of these you should be 64-bit safe. > If it turns out you did want to keep _MNEMONICS above, make sure to put it back > here. Made the changes from 16 to 15 and am using PRUptrdiff now here, in jdcoeffct.c and jmemmgr.c. Comment 11, jag (Peter Annema): > Right, so you'd want to use the IFAST processing on Windows, but since on > Windows _INTRINSICS isn't defined you'll fall back to the generic code. > Am I missing something here, or is that what you mean to do? I was originally planning to leave the ISLOW code on Windows and add the IFAST code for OSX as time was short and I wanted to get in a minimal code change. The code right now leaves the Windows code as it is and adds IFAST for Mac OSX only. The INTEL_MNEMONICS is set for Win32 so it will use ISLOW (either SSE2 or scalar) as chosen in jddctmgr.c. The code there already exists which is why it isn't seen in the patch. > Ah, that's good to know. jpeglib.h explicitly mentions that JDCT_IFAST is > faster but less accurate than JDCT_ISLOW, which I assume they do for a reason. > So, since nsJPEGDecoder picked JDCT_ISLOW, what was their reasoning? Maybe they > were just playing it safe? Note that I am relatively ignorant on the specifics > of jpeg decoding so I'm asking this more for my edification than as a critique > on the change. I don't have the history on the choice but I think that your assumption is reasonable. Comment 12 jag (Peter Annema): > Btw, see bug 416906 about using ! after #endif. /* ... */ is the way to go (or > just leave it off where it's trivial to connect the #endif with the > #ifdef/#if). Done.

Michael Moy

Assignee

Updated

•

17 years ago

Attachment #306803 - Flags: review?(pavlov)

jag (Peter Annema)

Comment 16

•

17 years ago

I think you can actually just do this: +#ifdef GFX_HAVE_SSE2_INTRINSICS +#define GFX_SSE2_ALIGN __attribute__ ((aligned (16))) +#endif + +#ifdef GFX_HAVE_SSE2_INTEL_MNEMONICS +#define GFX_SSE2_ALIGN __declspec(align(16)) +#endif So you can drop the extra parentheses for (IMO) increased readability: +GFX_SSE2_ALIGN const short color_constants_1[8] = {128, 128, 128, 128, 128, 128, 128, 128}; + SSE2_ALIGN JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU]; + SSE2_ALIGN jvirt_barray_ptr whole_image[MAX_COMPONENTS]; +SSE2_ALIGN unsigned short SSE2_ADD_128[8] = + {4096, 4096, 4096, 4096, 4096, 4096, 4096, 4096};

Stuart Parmenter

Comment 17

•

17 years ago

Comment on attachment 306803 [details] [diff] [review] Color patch with the SSE2 IFAST intrinsics optimizations for OSX this patch looks OK to me. Should we address jag's comments?

Attachment #306803 - Flags: review?(pavlov) → review+

Michael Moy

Assignee

Comment 18

•

17 years ago

Attached patch Updated patch with SSE2_ALIGN changes (obsolete) — Details — Splinter Review

Modified the SSE2_ALIGN macros as per jag's comments. I had to use two patches to build as I ran into the Parental Controls Vista SDK issue. I recreated the patch and had to strip the Parental Controls stuff.

Attachment #306803 - Attachment is obsolete: true

Attachment #313882 - Flags: review?(pavlov)

Mike Schroepfer

Updated

•

17 years ago

Flags: wanted-next+

Michael Moy

Assignee

Comment 19

•

17 years ago

This change exposes a problem with VS2005 SP1 and the SDK (I think). I just upgraded to SP1 and the SDK and the include "intrin.h" results in a dual definition failure. I looked it up on Google and it's a common problem. The recommendation was to comment out the declaration in intrin.h which works.

Mike Schroepfer

Comment 20

•

17 years ago

I'd like to continue to make progress on this for 1.9.1..

Flags: wanted1.9.1+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

17 years ago

Priority: -- → P2

Joe Drew (not getting mail)

Comment 21

•

17 years ago

Is it just VS2005's bugs that's holding this patch up? Or is there anything else needed?

Michael Moy

Assignee

Comment 22

•

17 years ago

I don't think that the updated patch was rereviewed.

Stuart Parmenter

Updated

•

17 years ago

Attachment #313882 - Flags: review?(pavlov) → review+

Michael Moy

Assignee

Comment 23

•

17 years ago

Attached patch Updated patch for 3.1 (obsolete) — Details — Splinter Review

I had to modify the patch as there was an extra or missing line from one of the modules that I had to fix. The patch has been tested on Windows and Mac OSX on the 3.1 code base.

Attachment #313882 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 24

•

17 years ago

Attached patch Updated patch for tinderbox — Details — Splinter Review

Using a name as a variable or #define didn't work in getting the compiler to ignore the SIMD code that wouldn't compile on unsupported platforms. So I changed the plumbing to use GFX_SSE2_POSSIBLE (meaning that SSE2 support is possible on the hardware/software platform) and GFX_SSE2_AVAILABLE as a variable or #define that determines at compile-time (for Mac/OSX Intel) or run-time (for Win32) which path to take. The SIMD code is only active when GFX_SSE2_POSSIBLE is defined so that there won't be compilation errors on unsupported platforms. I also modified the gfx code to use __sse2_available on Win32. This flag is used by the memcpy CRT routine (and others) to determine SSE2 support at runtime and is present in VS2003, VS2005 and VS2008. This removes the need to use __cpuid and therefore, intrin.h which caused multiply defined symbols.

Attachment #332163 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 25

•

17 years ago

Comment on attachment 332500 [details] [diff] [review] Updated patch for tinderbox Some additional changes to the SSE2 infrastructure to get it to work on the tryserver. The patch passed the tryserver build on Linux, Mac OSX and Windows.

Attachment #332500 - Flags: review?(pavlov)

Stuart Parmenter

Updated

•

17 years ago

Attachment #332500 - Flags: review?(pavlov) → review+

Michael Moy

Assignee

Comment 26

•

17 years ago

Who should I request for Superreview?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 27

•

17 years ago

Comment on attachment 332500 [details] [diff] [review] Updated patch for tinderbox Looks good to me.

Attachment #332500 - Flags: superreview+

Michael Moy

Assignee

Comment 28

•

17 years ago

Checkin requested (unless there's another testing phase required).

Ryan VanderMeulen [:RyanVM]

Comment 29

•

17 years ago

mmoy, I'm adding the checkin-needed flag to the whiteboard for you. It would help whoever checks it in if you provide commit message you want to show in the Hg log when they check the patch in.

Keywords: checkin-needed

Michael Moy

Assignee

Comment 30

•

17 years ago

One other comment - there may be resulting changes in JPEG test results as JPEG results are not lossless and there can be small differences due to computational differences. Do these tests get assigned to me?

Reed Loden [:reed]

Comment 31

•

17 years ago

Pushed as 17112:6eec92f9276a.

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → mozilla1.9.1a2

Reed Loden [:reed]

Comment 32

•

17 years ago

Yeah, the tests need to be updated... :(

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Makoto Kato [:m_kato]

Comment 33

•

17 years ago

Althouhg changing JPEG decoder setting to IFAST, it need update test codes like commet #32. But, I believe that SSE2 code of ycc_rgb_convert_argb() functin is usefull for all platforms without test regression. Michael, could you separate this into IFAST issue and SSE2 code issue of ycc_rgb_convert_argb() function? How do you think this?

Michael Moy

Assignee

Comment 34

•

17 years ago

I wanted to use IFAST for Mac OSX because of the performance regression in JPEG from 2.* to 3.0 but leave ISLOW SSE2 for Win32. The Mac OSX IFAST code generally runs fine on Windows x64. It probably runs fine on Linux but I can't test that. I will look into separating out the argb code from the idct code. I can also just port the ISLOW SSE2 code to intrinsics. Ideally, there would be a preference so that the user could select the transform code based on their needs.

jag (Peter Annema)

Comment 35

•

17 years ago

What needs to be done to update the tests? Would we end up with different test results on different platforms?

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → Core Graveyard

Reed Loden [:reed]

Updated

•

16 years ago

Component: GFX → Graphics

Product: Core Graveyard → Core

QA Contact: general → thebes

Target Milestone: mozilla1.9.1a2 → ---

Reed Loden [:reed]

Updated

•

16 years ago

Flags: wanted1.9.2?

Whiteboard: [needs tests updated]

Makoto Kato [:m_kato]

Updated

•

15 years ago

Blocks: jpeg-perf

Justin Lebar (not reading bugmail)

Comment 36

•

15 years ago

I think this whole patch will be unnecessary once we land libjpeg-turbo (bug 573948). Anyone mind if I WONTFIX?

Alfred Kayser

Comment 37

•

15 years ago

It is only open for the updating of tests, so that part can/should remain open.

Justin Lebar (not reading bugmail)

Comment 38

•

15 years ago

(In reply to comment #37) > It is only open for the updating of tests, so that part can/should remain open. I don't follow the discussion above about the tests. What about this bug is remaining to be done?

Makoto Kato [:m_kato]

Comment 39

•

15 years ago

(In reply to comment #38) > (In reply to comment #37) > > It is only open for the updating of tests, so that part can/should remain open. > > I don't follow the discussion above about the tests. What about this bug is > remaining to be done? Michel changed jpeg decompression algorithm from ISLOW to IFAST. So reftest is failed. If we can remove ycc_rgb_convert_argb by merging libjpeg-turbo, we should dup of bug 573948.

Justin Lebar (not reading bugmail)

Comment 40

•

15 years ago

libjpeg-turbo has a vectorized ycc_rgb_convert_argb function. Mind if I dupe with bug 573948?

Jeff Muizelaar [:jrmuizel]

Comment 41

•

15 years ago

(In reply to comment #40) > libjpeg-turbo has a vectorized ycc_rgb_convert_argb function. Mind if I dupe > with bug 573948? Seems sane.

Justin Lebar (not reading bugmail)

Updated

•

15 years ago

Status: REOPENED → RESOLVED

Closed: 17 years ago → 15 years ago

Resolution: --- → DUPLICATE

First pass at the patch 18 years ago Michael Moy 12.81 KB, patch		Details \| Diff \| Splinter Review
SSE2 JPEG color optimization 18 years ago Michael Moy 13.32 KB, patch		Details \| Diff \| Splinter Review
Color patch with the SSE2 IFAST intrinsics optimizations for OSX 18 years ago Michael Moy 44.19 KB, patch		Details \| Diff \| Splinter Review
Color patch with the SSE2 IFAST intrinsics optimizations for OSX 17 years ago Michael Moy 50.12 KB, patch	pavlov : review+	Details \| Diff \| Splinter Review
Updated patch with SSE2_ALIGN changes 17 years ago Michael Moy 48.72 KB, patch	pavlov : review+	Details \| Diff \| Splinter Review
Updated patch for 3.1 17 years ago Michael Moy 46.54 KB, patch		Details \| Diff \| Splinter Review
Updated patch for tinderbox 17 years ago Michael Moy 42.98 KB, patch	pavlov : review+ vlad : superreview+	Details \| Diff \| Splinter Review