We have added SSE2 optimization as bug 445552 and bug 456028. But this is for MSVC++ x86 and GCC only. I will add it for MSVC++ x64 that isn't support inline assembler.
thanks for the initiative on these patches makoto. I think we should definitely have a path with intrinsics for x86_64 msvc++ since it doesn't support inline assembler. The reason I didn't originally write this stuff with intrinsics is that vlad was worried that some of the code would end up less efficient that way. Once we verify that this code works for win64, I'd be curious to run some performance numbers on regular x86 of intrinsics vs inline assembler.
Patch looks pretty good. Before I r+ it though, I'd like to take a look at the results myself. I'm compiling the intrinsics version on my macbook. I'll post once I test it out.
arg, I built when the tree wasn't green. I'll get back to this tomorrow. Makoto - do you have any good way to test the performance of this patch?
This code is only windows x64 build only. (you need to get all patches from my tree (http://hg.mozilla-x86-64.com/). So if except to VC++ on x86_64, it uses previous code. Also, if you use other platform, you need to change that compiler uses this code path. gcc 4.x for intel and VC++ 8.0 for x86 can use these built-in SSE2 instruction such as _mm_*. I have tested on x86_64 Ubuntu/Fedora gcc by modifying code.
Comment on attachment 362537 [details] [diff] [review] patch v1.1 ok, looks good. r=bholley. Flagging vlad for sr. I'm running a modified version of this patch through the tryserver to see how the intrinsics fare against the raw assembly on mac/linux.
Note, lcms has been replaced with qcms recently.
qcms is bug 487900.
Marking WONTFIX as lcms is no longer used, and the actual improvement is added to qcms.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Comment on attachment 362537 [details] [diff] [review] patch v1.1 not using lcms anymore. canceling sr
You need to log in before you can comment on or make changes to this bug.