Closed
Bug 242145
Opened 20 years ago
Closed 14 years ago
Optimization of Inverse Discrete Cosine Transform for SSE2
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 573948
People
(Reporter: mmoy, Assigned: mmoy)
References
Details
(Keywords: perf)
Attachments
(2 files, 5 obsolete files)
1.25 KB,
patch
|
Details | Diff | Splinter Review | |
26.43 KB,
patch
|
pavlov
:
review-
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040427 Firefox/0.8.0+ (mmoy-O2-GL7-SSE2-crc32-quek01235) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040427 Firefox/0.8.0+ (mmoy-O2-GL7-SSE2-crc32-quek01235) ~50% code path reduction of the IDCT SSE2 code in jidctint.c Reproducible: Always Steps to Reproduce: 1. 2. 3.
Assignee | ||
Comment 1•20 years ago
|
||
jddctmgr.c patch
Assignee | ||
Updated•20 years ago
|
Attachment #147363 -
Attachment is patch: true
Assignee | ||
Comment 2•20 years ago
|
||
Assignee | ||
Comment 3•20 years ago
|
||
The patches should be reviewed by pavlov@pavlov.net but I can't figure out how to set the review flag. I can see the Flags: and Requestee: tags but there's no place to fill them in. One other thing is that the jddctmgr.c patch generates a compiler warning on msvc++ in the cast to an unsigned integer in the SSE2 varianted code. I tried a variety of things to get a clean compile but didn't come up with anything.
Comment 4•20 years ago
|
||
(In reply to comment #3) > The patches should be reviewed by pavlov@pavlov.net but I can't figure out how > to set the review flag. I can see the Flags: and Requestee: tags but there's no > place to fill them in. make sure to select "?" (review, superreview), then the field becomes editable.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 5•20 years ago
|
||
Here is what I see in my screen: http://mysite.verizon.net/vze6w3dz/bugzilla.JPG The only place that I see a ? option is in the blocking 1.7 and 1.8a dropdown boxes and that doesn't seem like the right place. http://mysite.verizon.net/vze6w3dz/bugzilla.JPG(In reply to comment #4) > (In reply to comment #3) > > The patches should be reviewed by pavlov@pavlov.net but I can't figure out how > > to set the review flag. I can see the Flags: and Requestee: tags but there's no > > place to fill them in. > make sure to select "?" (review, superreview), then the field becomes editable.
Comment 6•20 years ago
|
||
Isn't that uint cast going to break on a 64bit machine? And, you need to edit the attachment to see the review boxes.
Assignee | ||
Comment 7•20 years ago
|
||
(In reply to comment #6) > Isn't that uint cast going to break on a 64bit machine? No. The SSE2 code is varianted on XP_WIN32 (see jmorecfg.h and jdapimin.c). Even if it weren't varianted, I think the cast would be okay. The & would be a problem over 4 GB of memory though. A (x / 16) * 16 would probably be better but I was looking for something with a fast execution time. > And, you need to edit the attachment to see the review boxes. I'll go take a look.
Assignee | ||
Updated•20 years ago
|
Attachment #147363 -
Flags: review?(pavlov)
Assignee | ||
Updated•20 years ago
|
Attachment #147364 -
Flags: review?(pavlov)
Comment 8•20 years ago
|
||
Have you got some perf numbers ? Is there a test suite for JPEG on the internet ?
Comment 9•20 years ago
|
||
what will happen when running with this patch on a CPU that doesn't support SSE2?
Keywords: perf
Assignee | ||
Comment 10•20 years ago
|
||
(In reply to comment #9) > what will happen when running with this patch on a CPU that doesn't support SSE2? The C code path will be used. The code is conditionally compiled for Windows. If Windows is found, then a CPUID check is run on jpeg initialization which determines the capabilities of the machine. Currently flags are set for MMX and/or SSE2 if the CPU has this capability. The cpuid instruction is used to determine this. The code is at the bottom of jdapimin.c: #ifdef HAVE_SSE2_INTEL_MNEMONICS static int sse2support() { int sse2available = 0; int my_edx; _asm { mov eax, 01 cpuid mov my_edx, edx } if (my_edx & (0x1 << 26)) sse2available = 1; else sse2available = 2; return sse2available; } #endif
Assignee | ||
Comment 11•20 years ago
|
||
(In reply to comment #8) > Have you got some perf numbers ? No. I do have some performance tests that I ran when I filed the bug to turn on the SSE2 code that eliminated disk and network variances by using a RamDisk for the software and the images. It loaded three 10 MB files and used a JavaScript timer to try to remove the human aspect. The big problem is to get Windows to quiet down enough to get a decent test. I have received anecdotal comments from users that have downloaded my latest and intermediate builds and I didn't see any negative comments on performance or rendering. If you would like to give it a try, the Pentium4 build is available at http://www.pryan.org/firefox/mmoy/ The build there will only run on SSE2 enabled machines, though, as the optimization flags include arch:SSE2. My intermediate builds have cosine, cosine-b and cosine-c in them. You can see the download statistics on these builds at http://www.pryan.org/stats/ > Is there a test suite for JPEG on the internet ? Not that I know of. If there were one, I'd expect that it would be something that you want to run locally.
Assignee | ||
Comment 12•20 years ago
|
||
(In reply to comment #6) > Isn't that uint cast going to break on a 64bit machine? I think that I'm going to explictly put the conditional compilation code in for jddctmgr to only appear on Windows so that it won't even get compiled on 64-bit and non-Windows platforms to eliminate potential compilation problems. I'll make the changes and run a full build which will probably be done this afternoon.
Assignee | ||
Comment 13•20 years ago
|
||
Added conditional compilation for alignment code.
Assignee | ||
Updated•20 years ago
|
Attachment #147363 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #147497 -
Flags: review?(pavlov)
Comment 14•20 years ago
|
||
Comment on attachment 147363 [details] [diff] [review] jddctmgr.c patch (to align a structure) clearing obsolete review request
Attachment #147363 -
Flags: review?(pavlov)
Assignee | ||
Comment 15•20 years ago
|
||
I ran my JPEG performance test for my 5/1 build against the official nightly 5/2 build and found a 15% performance improvement. The test uses a Javascript timer and loads three 10 MB images in one page. The Program Directory is stored on a ramdisk as are the three 10 MB images along with the html webpage. The profile used is stored on disk so differences in disk latency are not minimized by using the ramdisk. The IDCT is part of JPEG decompression and other parts of the JPEG code may be worth a look too. I have an Intel CD-DROM with VTUNE on it that came with an Intel Optimization book but I haven't tried it out yet. It would probably provide more accurate statistics in comparing the performance of the current IDCT SSE2 code with my version. The version of VTUNE that I have isn't a full version and I don't know if I can do this analysis or not.
Assignee | ||
Comment 16•20 years ago
|
||
Another builder reported a crash on startup with this code on Athlon XP though it works fine on P4s. I requested the assembler code around the crash point to debug the platform difference. I've had several hundred downloads and this is the first report of a crash problem. I'll report back when I get more information.
Assignee | ||
Comment 17•20 years ago
|
||
There's a one-line cut/paste bug in the jddctmgr.c patch. I'll upload a new one Saturday night.
Assignee | ||
Comment 18•20 years ago
|
||
Attachment #147497 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #147497 -
Flags: review?(pavlov)
Assignee | ||
Comment 19•20 years ago
|
||
Attachment #148023 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #148024 -
Flags: review?(pavlov)
Comment 20•20 years ago
|
||
Is there a possibility to port your optimizations to SSE? Users of slower systems with SSE support could probably need better performance rather than those with already fast SSE2 systems (of course better performance is always appreciated, doesn't really matter how fast your system already is ;-))
Assignee | ||
Comment 21•20 years ago
|
||
(In reply to comment #20) > Is there a possibility to port your optimizations to SSE? Users of slower > systems with SSE support could probably need better performance rather than > those with already fast SSE2 systems (of course better performance is always > appreciated, doesn't really matter how fast your system already is ;-)) I have an SSE port that's about 80% done. When I'm done, I'll do some performance comparisons to see if it's worthwhile trying to get it into the code base. The SSE port was considerably more work than the SSE2 port as there isn't nearly as much to work with in SSE compared to SSE2 when you're working with integers. The SSE floating point support is very good but an FP approach that I tried proved to only provide very small returns. I think that the high cost of integer to FP and back to integer may have negated a little less work in scaling and descaling.
Comment 22•20 years ago
|
||
15% perf increase looks very promising, thanks ! I'll consider changing my motherboard and CPU ;-) Just one more question: Are you sure that the new code is absolutely equivalent to the old code? I'm thinking about using both old and new code to decompress some images and compare the uncompressed bitmaps (possibly convert jpeg to uncompressed bmp files) to prove that they are the same. I can't test this because I don't have SSE2 :-(
Assignee | ||
Comment 23•20 years ago
|
||
(In reply to comment #22) > Just one more question: > Are you sure that the new code is absolutely equivalent to the old code? Well, that's like asking if someone is sure that there are absolutely no bugs in their code. I laid out what I did to go from the old code to the new code and was fairly careful at each stage. Is it possible that I made an error in my understanding of one or two of the SSE2 instructions? Of course. In general, I have a bunch of ~1 mb images that I used to verify the code from one stage to the next and you are somewhat dependent on my coding abilities and my eyesight. One thing that I'll mention though. When you make a mistake in coding with this stuff, 19 times out of 20, you know it very quickly. There have been about 600 downloads of the code at various levels of porting and I haven't received one report of rendering problems though. > I'm thinking about using both old and new code to decompress some images and > compare the uncompressed bitmaps (possibly convert jpeg to uncompressed bmp > files) to prove that they are the same. > I can't test this because I don't have SSE2 :-( I'd welcome such a test. With current review methods, I think reviewers are limited to checking the code which can be quite time-consuming if you're not conversant with these esoteric assembler instructions.
Assignee | ||
Comment 24•20 years ago
|
||
When you made your comment about comparing bitmaps, I thought to myself how would you do that? You'd have to write code to intercept the output from the JPEG code and write it out to a file. Of course it struck me later that one could simply do a COPY IMAGE and then paste it into PAINT and then save it as a bitmap (that's probably the default anyways). I may give this a try later this week. I also finished the SSE port this evening and it's 19% faster than the official nightly using my large JPEG rendering test.
Comment 25•20 years ago
|
||
You do realize that the jpeg directory is just a modified version of the IJG codebase, which contains a command line decoder (djpeg.c) that can be used for timing/testing, right?
Assignee | ||
Comment 26•20 years ago
|
||
(In reply to comment #25) > You do realize that the jpeg directory is just a modified version of the IJG > codebase, which contains a command line decoder (djpeg.c) that can be used > for timing/testing, right? Yes to the former, no to the latter. I'll have to take a look at that.
Assignee | ||
Comment 27•20 years ago
|
||
(In reply to comment #20) > Is there a possibility to port your optimizations to SSE? Users of slower > systems with SSE support could probably need better performance rather than > those with already fast SSE2 systems (of course better performance is always > appreciated, doesn't really matter how fast your system already is ;-)) Give this build a try: http://www.pryan.org/mozilla/firefox/mmoy/mmoy-05-03-2004-Pentium3G.exe
Assignee | ||
Comment 28•20 years ago
|
||
Out of curiousity, how long does it take to get something reviewed? My other two changes got a lot of attention pretty quickly (turning on SSE2 and fixing the SSE2 alignment problem) but this one seems to be taking a while.
Comment 29•20 years ago
|
||
(In reply to comment #28) > Out of curiousity, how long does it take to get something reviewed? You have to take the bug (assign to yourself), go on IRC (chatzilla), find a suitable reviewer. Ultimately mail drivers@mozilla.org to find one.
Assignee | ||
Comment 30•20 years ago
|
||
Added some small improvements.
Attachment #147364 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #149128 -
Flags: review?(pavlov)
Comment 31•20 years ago
|
||
can you post a diff of jidcint.c instead of the full file?
Assignee | ||
Comment 32•20 years ago
|
||
Prev file was the c file; not the patch
Attachment #149128 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #149128 -
Flags: review?(pavlov)
Assignee | ||
Updated•20 years ago
|
Attachment #149139 -
Flags: review?(pavlov)
Assignee | ||
Updated•20 years ago
|
Attachment #147364 -
Flags: review?(pavlov)
Assignee | ||
Comment 33•20 years ago
|
||
This website consistently crashes FF on Linux. It's intermittant on Windows but crashes just about everytime I go to that site on Linux. I don't have a debug environment on my Linux box though nor do I have a build environment.
Assignee | ||
Comment 34•20 years ago
|
||
(In reply to comment #33) > This website consistently crashes FF on Linux. It's intermittant on Windows but > crashes just about everytime I go to that site on Linux. I don't have a debug > environment on my Linux box though nor do I have a build environment. Please ignore this post, it's meant for another bug.
Assignee: jdunn → nobody
QA Contact: imagelib
Comment 35•17 years ago
|
||
stuart, any chance you could take a look at this for 1.9?
Flags: blocking1.9?
Comment 36•17 years ago
|
||
I don't know intel SSE2 so I'm not really qualified to review this. Not sure who is.
Comment 37•17 years ago
|
||
Alfred, do you know SSE2?
Assignee | ||
Comment 38•17 years ago
|
||
You could ask Makoto Kato as he's generally good with assembler.
Flags: blocking1.9? → blocking1.9-
Whiteboard: [wanted-1.9]
Comment 39•17 years ago
|
||
Comment on attachment 149139 [details] [diff] [review] jidctint.c patch i just tried running with this patch and I crash loading images on many common websites. in jpeg_idct_islow_sse2 doing: pmullw xmm0, QWORD PTR [edx+0]
Attachment #149139 -
Flags: review?(pavlov) → review-
Updated•17 years ago
|
Assignee: nobody → mmoy
Assignee | ||
Comment 40•17 years ago
|
||
Did you also apply the jddctmgr.c patch? That patch should align the structure that the pmullw is crashing on. That was a long time ago when I wasn't all that good at doing patches for multiple modules in one file.
Comment 41•17 years ago
|
||
i'm not sure that I applied all the correct patches. Is there any way you can put them together in a single patch? We're trying to find someone good to review the patches and would like to get them in for Firefox 3 if possible.
Updated•17 years ago
|
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Comment 42•17 years ago
|
||
Michael - once the smoke clears on the other JPEG patches we can try for this one...
Comment 43•15 years ago
|
||
Could it be that this bug is superceeded by bug 477728? The patches look more or less the same. Adding dependency link to signal relationship, but I think one is a DUP of another...
Depends on: 477728
Comment 44•14 years ago
|
||
This is subsumed by bug 573948: Switch to libjpeg-turbo. I'm going to dupe; feel free to undupe if you think this is wrong.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Updated•12 years ago
|
Attachment #148024 -
Flags: review?(pavlov)
You need to log in
before you can comment on or make changes to this bug.
Description
•