Closed Bug 125762 Opened 23 years ago Closed 21 years ago

Optimized inverse discrete cosine transform (iDCT) functions for libjpeg

Categories

(Core :: Graphics: ImageLib, defect, P4)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla1.1alpha

People

(Reporter: cathleennscp, Assigned: cathleennscp)

References

Details

(Keywords: perf, Whiteboard: [adt1])

Attachments

(3 files, 2 obsolete files)

Keywords: perf
hmmm what's this bug about? Any specifics? What about another bug ala "Faster mozilla"...:)
Yeah, this bug is kinda vague... :-). /be
I'd be very suprised to see a profile of mozilla where libjpeg was a significant contributor. Working with an extremely throttled proxy connection, I do see that we don't seem to be generating paint events often enough for smooth progressive loading.
taking. more info to come shortly.
Assignee: cathleen → pavlov
Severity: normal → minor
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: --- → mozilla1.0
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+, topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword. Please send any questions or feedback about this to adt@netscape.com. You can search for "Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Target Milestone: mozilla1.2 → mozilla1.0
Keywords: mozilla1.0, nsbeta1
per adt, not critical for nsbeta1. hence minus.
Keywords: nsbeta1nsbeta1-
we never got the info what this bug is about... "faster jpeg decoder" is not very informative. Is this bug about some specific imagelib code? Please also file "faster imap code" bug...:)
this is critical for nsbeta1, see bugscape http://bugscape.mcom.com/show_bug.cgi?id=12175 removing nsbeta1- everyone, please be patient. more info soon.
Keywords: nsbeta1-nsbeta1
pav/cathleen believe this is critical and almost ready. So a plus.
Keywords: nsbeta1nsbeta1+
What's almost ready? Is jpeg decoding performance such a critical issue that we want to get rid of our tried-and-tested-for-a-decade IJG code for some new thing? If it didn't come out of IE, it feels like a lot of risk, and I haven't seen a lot of people complaining about jpeg performance. (Most of the images on the web, especially out of the top100, are GIFs, right?) Someone convince me that this is worth the review, super-review and integration cycles, when we have a boatload of serious imagelib bugs (5 crashes, including a topcrash+) assigned to pavlov and missing 0.9.9 as I type this. I'm filled with doubt, and find myself wishing, for the first time in my life, that I was pav's manager. =)
calm down beavis. this is a small patch from an external contributor to libjpeg. It isn't a all-new decoder. This code isn't a huge win. Raw JPEG decoding (using djpeg) only showed about a 3% improvement on a 1.5GHz P4 machine. I don't believe this is at all critical for 0.9.9 but is something that would be nice to have for 1.0.
Summary: faster jpeg decoder → Optimized inverse discrete cosine transform (iDCT) functions for libjpeg
If it's just a small patch, then why is it and the surrounding discussion living in bugscape?
because it has licensed stuff from the external contributor. we're just waiting on an ok from mitchell on the additional license stuff
shaver: be careful what you wish for :-) and as cathleen once said "everyone, please be patient"
Let's see if I understand: we're looking at 3% improvement in pure jpeg decode on a relatively small fraction of deployed machines (P4s), which is likely dwarfed by the rest of mozilla's imaging pipeline. To get this the legal people are messing around with licensing issues? Surely they have better things to spend their time on. If Intel wants to contribute their patch to libjpeg under the IJG terms that's one thing, but let's not waste any more effort than is strictly necessary.
Attached patch P4 SSE2 JPEG optimization (obsolete) — Splinter Review
request for r/sr :-)
Attachment #75677 - Attachment is obsolete: true
removed #if 0 from last patch. it was added when we were trying to figure out compiling errs on machines with no processor pack installed. request r/sr
correcting previous error on attach ID # r=james.rose@intel.com on attachment #75989 [details] [diff] [review]
James Rose - don't take this personally, but this is the first time I've seen your name in bugzilla. Could you briefly mention your qualifications for reviewing this patch and disclose if you were involved in creating said patch? Thanks.
Jame Rose is the Intel developer we're working with. I asked him to review the patch to make sure it is matching what they're expecting.
ok this is a quick non authorative review. 1. IANAL, mitchell: the mozilla.org policy as i understood it was that new files under non tri license had to be stuck in other-licenses. There's a file here that clearly doesn't fit that policy. 2. The check is for Win32, x86 and __m128i. My guess is that __m128i indicates SSE2. but if that's the case then why the other checks. If that's not the case then we need some questions answered: can this be true for BC5.5, DMC, MSVC5.2, or OpenWatcom I just looked up __m128i using google, and it appears to be the right check. So I'm curious as to why you have the x86 check. If you're concerned about endianess, you should check for that... [At the very least, whatever you do really shouldn't break any compilers that currently work, (that's MSVC5.2, 6, 7) -- I doubt they would, but I'm just checking on behalf of some concerned parties...] Assuming someone had Intel's C++ compiler for linux working, could it use this optimization (if not for the win32 ifdef)? This is bad: +#ifdef HAVE_SSE2_INTEL_MNEMONICS + if(SSE2Available == 1) + { + method_ptr = jpeg_idct_islow_sse2; + method = JDCT_ISLOW; + } + else + { + method_ptr = jpeg_idct_islow; + method = JDCT_ISLOW; + } +#else + method_ptr = jpeg_idct_islow; + method = JDCT_ISLOW; I'd suggest that we do this instead: +#ifdef HAVE_SSE2_INTEL_MNEMONICS + if(SSE2Available == 1) + method_ptr = jpeg_idct_islow_sse2; + else +#endif + method_ptr = jpeg_idct_islow; + method = JDCT_ISLOW; Note that your (I'm using this generically, I have no idea who wrote the patch because that stuff is hidden in bugscape) patch results in 3 code paths, all of which set one field to the same value, and results in two identical code paths, which might need to be maintained. the #ifdef DCT_IFAST_SUPPORTED case should also be changed so we don't have to maintain three code paths. -- If someone was trying to make the code in the new file line up, they failed. Could someone either strip whitespace two single spacing + indentation rules, or make everything almost line up? (I suspect there are tabs involved. tabs are taboo) +* and are shifted to the left for rise of accuracy 'rise'? +* Dequantize 8x8 block of DCT coefficients +* Inverse DCT transform, de-quantization and level shift i suppose dequantize is a word, and de-quantization is some accepted flavor, but I couldn't find it in m-w.com or dictionary.com, I'll go try to learn what it is this evening.
James Rose said in private email that he did the integration of intel jpeg code into libjpg for this patch. We don't allow people to review their own patches - you'll need another reviewer.
True, James Rose contributed the code, but we generated the patches, so i thought it would be a good idea to make sure the patch we generated is exactly what they proposed. (and he did catch an err earlier) anyway, since i have tried Intel's code and tested, and even though I'll be checking it into mozilla for Intel, I'll put my stamp on it. r=cathleen on attachment #75989 [details] [diff] [review]
Attachment #75989 - Flags: review+
Keywords: mozilla1.0mozilla1.0-
Whiteboard: [adt3]
Comment on attachment 75989 [details] [diff] [review] P4 SSE2 JPEG optimization timeless' comments are correct, though one would hope that the compiler will hoist the constant assignment; as always, I will gripe about post-fix incrementing used when prefix is intended ... but these are minor nits not enough to stop this from going in. This code is reasonable. The question about whether it should be built in the mozilla case must take into consideration the size cost (~ 2k? ... Cathleen, can you follow up with numbers?) vs. the speed win and how often we get it. Currently, I understand that mozilla won't build this code path, since we would need the processor pack to make the compiler do it. Even if we had the processor pack, it looks like we could configure this off in autoconf by forcing |HAVE_SSE2_INTEL_MNEMONICS| off. That means someday, drivers will have another decision to make about whether to turn this on or not. In the meantime, sr=scc
Attachment #75989 - Flags: superreview+
wrt mozilla actually building this path, we might. (I ran into this stuff again in some other venue, so i've been thinking about it a bit.) I think that systems that build with crypto might actually have the processor pack because it might be a requirement of nss. this comment posted with nc4 instead of qnx voyager (i'm sorry about the earlier one not wrapping).
code size: original - 74,240 b w/ patch - 73,728 b (a bit smaller, possibly optimized by combo of service pack 5 and processor pack) pageload on P4 2.2GHz original - 296 w/ patch - 296 pageload on p2 200MHz original - 2688 w/ patch - 2689 * SSE2 code will not get compiled if Processor Pack is not installed on compiling machine. * SSE2 code will not get executed on machines other than P4 with SSE2 chips * JPEG decompression on large JPEG ( 1MB or larger ) is about 5-7% improvement. although internal page-load test not showing difference.
Assignee: pavlov → cathleen
Status: ASSIGNED → NEW
Keywords: adt1.0.0
Whiteboard: [adt3] → [adt1]
This has been nsbeta1- with Gagan, and marked by Mozilla as 1.0-. Pls let us know why we should be taking this at this time?
Marking adt1.0.0- on behalf of the ADT. We're at a point where we can't take the risk associated with this into Mach V. We should get this into the trunk after Mozilla branches for 1.0.
Keywords: adt1.0.0adt1.0.0-
patch landed on mozilla trunk on 4/29/02, for post 1.0 releases.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.0 → mozilla1.1alpha
Verified fix checked into lxr.mozilla.org
Status: RESOLVED → VERIFIED
Ok, apparently this new code was never turned on (which would explain the result in comment 27) because defined(__m128i) isn't the right test. Looking through the web it seems like the _M_IX86 test should be enough to test for presence of the processor pack needed to compile the SSE2 code.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
*** Bug 236288 has been marked as a duplicate of this bug. ***
You can find builds and their descriptions at http://forums.mozillazine.org/viewtopic.php?t=54487 You can find performance tests and results showing at the bug that was just marked a duplicate (236288). I have a build with the SSE2 JPEG stuff in it that runs and I will test it on non-SSE2 builds later tonight. I'll also upload so that anyone that cares to can try it out to ensure that it doesn't break other Windows processor/OS platforms.
Attached patch change check for sse2 (obsolete) — Splinter Review
This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and processor pack, as specified on the mozilla.org build instructions) and Visual Studio .net 2003.
Here's a build with SSE2 JPEG enabled that should run fine on any WinTel platform: http://www.pryan.org/mozilla/firefox/mmoy/FireFoxB-Experimental-2004-03-03-D__m128i=1.exe I've already tested it on one of my PIIIs. Test away on older machines if you have the time and the inclination.
(In reply to comment #35) > Created an attachment (id=142867) > change check for sse2 > > This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and > processor pack, as specified on the mozilla.org build instructions) and > Visual Studio .net 2003. There's a comment in Bug 236288 where this code changes causes a problem with SeaMonkey. This code change in SeaMonkey either crashes on startup or crashes after going to one or two pages. So perhaps some a flag could be used so that this only gets built for FireFox. I'm working on a FireFox build right now but will do a debug SeaMonkey build tomorrow to see if I can find the problem over there. Of course if someone else is planning on doing this, send me an email and I won't bother.
Crashing in the jpeg code is likely due to bug 137478 - swalker reports that with the latest patch from there things seem to be working fine with either O1 or O2.
(In reply to comment #38) > Crashing in the jpeg code is likely due to bug 137478 - swalker reports > that with the latest patch from there things seem to be working fine with > either O1 or O2. I did three SeaMonkey builds with the 2003-10-24 05:46 PST jdapimin.c patch and they all failed. I've done a number of builds with FireFox with the same patch that had no problems. I have a FireFox build running right now with the older jdapimin.c patch in it. I'll fire up a SeaMonkey debug build with the new patch from today after the FireFox build finishes so we can find out if the crash goes away or not. If the crash is still there, we should have a stack trace to look at. The crash in SeaMonkey was with GL-G7-O2-arch:SSE2 optimization which is about as aggressive as you can get. I'll probably take out GL as this may not work well with debug. I got the feeling that the bug was fixed with the older patch and that today's patch was more of a better way of doing it rather than a change to fix a crash problem with the older patch. Again, let me know if you disagree or plan to do the build and test.
Well, the builds didn't go so well last night. See the FireFox build forums if you're interested in the gory details. I also blew away my build environment by accident but had a backup which I restored. The SeaMonkey build is running right now off of a stable tarball that I pulled down a few days ago. The debug option apparantly forces O2 optimization to O1 so I'll have to do another build without debug to test O2.
The debug build dies in dbgheap with an assertion. Continueing generates another assertion. I didn't see anything obvious in the stack trace (first time I've used this version of Visual Studio for debugging). I'm going to take out debug and all optimizations and just turn on __m128i to see if this works. If it does, I'll add back things until it breaks again. I'll save the debug build as it might be useful later on. If anyone wants to play with it, let me know and I'll put it up tonight.
Taking out debug and the optimizations gave me the No Bindings for XBL objects or something similar to that. I'm wondering if the problem is with the SSE2 test code. The patch that you referred to does testing for MMX capability and the previous patch worked fine. Now that we're enabling SSE2, I'm wondering if it's a problem in the sse2support routine at the bottom of jdapimin.c. I changed the code to just set it to 1 instead of calling sse2support to eliminate that mixed C and assembler code as a cause of the problem and the build is off and running again.
Finally got a SeaMonkey build to work. Optimization was arch:SSE2, O2 and G7. I left GL out to save build time. I modified the routines at the bottom of jdapimin.c to return 1 and this worked. Next up is a build with the latest jdapimin.c patch (MMX) with the SSE2 routine just returning a 1. If this works, then I'll put the SSE2 code back in which should either show or refute a problem with that code.
I put the mmx code back into jdapimin.c with the latest patch and the build worked. Note that the sse2 code just returns a 1 (sse2 available). I just started a build with the sse2 code added back in. If this one crashes, then that's a pretty good indication that the problem is in the sse2support routine. This build has the -FAs switch to generate a listing of the assembler code which may make spotting the problem easy. Here's the code from jdapimin.c for reference. int sse2support() { int sse2available = 0; int my_edx; _asm { mov eax, 01 cpuid mov my_edx, edx } if (my_edx & (0x1 << 26)) sse2available = 1; else sse2available = 2; return sse2available; }
; 479 : { push ecx push ebx ; 480 : int sse2available = 0; ; 481 : int my_edx; ; 482 : _asm ; 483 : { ; 484 : mov eax, 01 mov eax, 1 ; 485 : cpuid cpuid ; 486 : mov my_edx, edx mov DWORD PTR _my_edx$[esp+8], edx ; 487 : } ; 488 : if (my_edx & (0x1 << 26)) mov eax, DWORD PTR _my_edx$[esp+8] and eax, 67108864 ; 04000000H neg eax sbb eax, eax add eax, 2 pop ebx ; 489 : sse2available = 1; ; 490 : else sse2available = 2; ; 491 : ; 492 : return sse2available; ; 493 : } pop ecx ret 0
Added back in the sse2available code and the build worked and the browser worked. Then added GL optimization back in and it crashed. So the problem is seen only with GL optimization (from my perspective). I changed sse2available back to just returning 1 and am building now.
Stop us from crashing on 386 and older 486 processors, which didn't have the cpuid instruction.
Attachment #142867 - Attachment is obsolete: true
Comment on attachment 142994 [details] [diff] [review] same + prevent using cpuid on older machines r=me. But if we have problems with the new sse3 code and noone steps up really fast to fix it lets just remove it.
Attachment #142994 - Flags: review+
i mean sse2 of course. All sse3 could should be immideatly disabled since such instructions arn't supported by any cpu, or even invented :)
Attachment #142994 - Flags: superreview?(bryner)
Attachment #142994 - Flags: superreview?(bryner) → superreview+
Checked in.
Status: REOPENED → RESOLVED
Closed: 23 years ago21 years ago
Resolution: --- → FIXED
I get a compile error on windows ME, with VC++ 6.0 fatal error C1600: unsupported data type make[1]: *** [jidctint.obj] Error 2 If I backout the change to jmorecfg.h, it compiles and links.
Do you have the service and processor packs that are in the build requirements? http://www.mozilla.org/build/win32.html#ss2.2
alex, tor: I got this same build problem and getting the latest service and processor packs fixed it for me. I posted something to n.p.m.builds (news://news.mozilla.org:119/404BA31C.1060905@mozilla.org) in case someone else hits the same problem and googles for it.
I only have the standard edition, so I'll just stick with my patched tree to get around requiring an assembler. (Just not building crypto used to be enough :-/)
(In reply to comment #53) > alex, tor: I got this same build problem and getting the latest service and > processor packs fixed it for me. > And what about those of us who're compiling with Cygwin, eh? Yep, same damn build problem :-)
(In reply to comment #55) > (In reply to comment #53) > > alex, tor: I got this same build problem and getting the latest service and > > processor packs fixed it for me. > > > > And what about those of us who're compiling with Cygwin, eh? Yep, same damn > build problem :-) We don't hit this when compiling with mingw gcc. However, I think that may be a fluke. The standalone mingw compiler doesn't define _M_IX86. It expects to get that define from <windows.h>. I'm guessing the cygwin gcc defines _M_IX86 by default. Besides the __declspec(align) issue, gcc will barf on the inlined intel asm. gcc uses the at&t syntax.
Attachment #144153 - Flags: superreview?(tor)
Attachment #144153 - Flags: approval1.7b?
Attachment #144153 - Flags: superreview?(tor) → superreview+
Attachment #144153 - Flags: approval1.7b? → approval1.7?
Comment on attachment 144153 [details] [diff] [review] No mmx/sse2 for win32 gcc a=asa (on behalf of drivers) for checkin to 1.7
Attachment #144153 - Flags: approval1.7? → approval1.7+
What does GCC have as far as SSE2 support goes? Does it have intrinsics? If so, it should be pretty easy to port it over.
Just curious, since Mr. Dotzler gave a= for this patch: Will this change cause a visible increase in JPEG rendering speed on P4/SSE2 architectures? I have a P4-M laptop with a P4/SSE2-optimized 20040325 build and can perform some rough benchmarks if needed.
In the bug that I found, I reported my performance testing results and found that the SSE2 code resulted in 30% less time in rendering JPEG images. I used three images that totalled about 11 MB with nothing else on the page and everything was loaded from RamDisk (Program, html and image files) except for the Profile. The SSE2 code should be in any MSVC++ build after the code went in. You don't need an optimized build to get the SSE2 JPEG benefit. What would be nice is if the code were ported to GCC and Linux and even nicer if it was ported from an Integer implementation to a Floating Point implementation as I think the latter would be more efficient.
I did some major surgery on the JPEG SSE2 code and brought the codepath down by about 50%. The code path before was 600 instructions and it's now 316. I didn't include the call/return overhead for the five procedures in my calculations so the 600 number is low. There is no call/return overhead in my code as it's all in one routine now. And I've made a lot of effort to reduce memory access. I did three public builds along the way and users were pretty happy with the results. Though I've made quite a few improvements since my last released build. Question: how do I go about getting the code into Mozilla? I have patch files to implement the change with documentation as to what I did to jidctint.c.
Michael, you should open another bug and submit your patch for review to get the code into Mozilla. About this bug, I really dislike the way the way the patch was implemented. 1) Why are those optimizations only done for Wintel platforms, rather than Intel platforms ? There are other platforms that run on intel chips with SSE2, ie. Linux, OS/2, Solaris x86 . Should they not benefit too ? 2) Why is the assembly code inline in C files ? Several C compilers on different x86 platforms cannot process inline assembly. The assembly code should be in separate .s or .asm files , not in C files .
(In reply to comment #63) > Michael, you should open another bug and submit your patch for review to get the > code into Mozilla. I'll try to figure out how to open a bug to do that. > About this bug, I really dislike the way the way the patch was implemented. > > 1) Why are those optimizations only done for Wintel platforms, rather than Intel > platforms ? There are other platforms that run on intel chips with SSE2, ie. > Linux, OS/2, Solaris x86 . Should they not benefit too ? Well, I don't personally have machines that run those operating systems. > 2) Why is the assembly code inline in C files ? Several C compilers on different > x86 platforms cannot process inline assembly. The assembly code should be in > separate .s or .asm files , not in C files . MSVC++ makes it really easy to do Intel extensions in that they support inline assembly and intrinsics. Some of the nice things about inline is that you can use your C variable in your assembler code and that it takes care of the saving and restoring of registers in and out of your routines. If you want, you can write a little C then a little Assembler, then a little C and so on. If you're porting a routine, it's very nice in that you can convert a little of the code at a time inline. Is it the best way to do portable code? Only if you go by Bill's definition of portable. But if I had the hardware and software, I imagine a port would be as hard as doing a build. I also have a port to SSE for Pentium about 40% done and have done a little work on an MMX port.
> Well, I don't personally have machines that run those operating systems. I don't know whether you care about propagating this code further than Mozilla-on-Windows, but be notified: your chances of getting code accepted into upstream libjpeg with the above approach are nil. We did not sweat bullets to create a portable library only so that cowboys with zero interest in portability could slap random patches on top.
(In reply to comment #65) > > Well, I don't personally have machines that run those operating systems. > > I don't know whether you care about propagating this code further than > Mozilla-on-Windows, but be notified: your chances of getting code accepted into > upstream libjpeg with the above approach are nil. We did not sweat bullets to > create a portable library only so that cowboys with zero interest in portability > could slap random patches on top. Send me some hardware and I'll be happy to port it.
I figured out how to port this to GCC, at least for Linux. For those that are wondering why no one has ported to GCC in the past: Microsoft Visual C++ Inline Assembly support is much easier to use than what's in GCC. Another thing is that it's relatively easy to find examples of MSVC++ Inline Assembly and MSVC++ Inline Assembly with MMX/SSE/SSE2 instructions. GCC Inline Assembly and GCC SIMD Inline Assembly examples are few and far between on Newsgroups and on the WWW in general. Though there's one more working example as of this morning. I just need to line up someone that can build and test on P4 Linux. There are a few Linux Unofficial builders on MozillaZine and perhaps one of them will volunteer. My P3 SSE port is done as well and there are lots of people using it in the form of unofficial builds. And I plan a port to Intrinsics for Windows 64 as Microsoft's development tools don't support Inline Assembly for that platform. I just need to get my hands on an Athlon 64 system.
I'm looking for a volunteer with a GCC Pentium 4 or Pentium 3 build environment on Linux to work with me on the port of the SSE and SSE2 code. This will be an iterative project with me porting and someone building and testing. At the moment, I have the CPUID stuff done and am looking for a sanity check on that. I've posted in the Unofficial Builders forum but don't have any responses yet. This is an opportunity for those that want the SSE/SSE2 code working on Linux to step up to the plate and help to get it done.
It may have caused major regression bug 247437 (see attached screenshot), the other offender would be bug 137478
There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) , starting on 2004031615 (72 records from talkback), this is currently topcrash #5 for Mozilla 1.7. URL in the comments are: http://www.anandtech.com http://www.diezeit.de http://www.linux.org/ http://strasbourg.eauxvives.free.fr/phpBB2 I can't reproduce because I don't have SSE2 but it would be worth a try ...
Those sites all open fine for me. Could you send me the assembler before and after the crash point? Also, is hardware information available? (In reply to comment #70) > There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) , > starting on 2004031615 (72 records from talkback), this is currently topcrash #5 > for Mozilla 1.7. > > URL in the comments are: > http://www.anandtech.com > http://www.diezeit.de > http://www.linux.org/ > http://strasbourg.eauxvives.free.fr/phpBB2 > > I can't reproduce because I don't have SSE2 but it would be worth a try ... >
Have there been any reports on FireFox?
Yes, firefox 0.9 2004061423 All crashes seems to occur on [Windows NT 4.0 build 1381], trigger reason is "Illegal instruction", at http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/jpeg/jidctint.c&mark=716#716 It looks like the CPU supports SSE2, but NT4 doesn't like it, maybe NT4 needs a special driver or service pack to enable SSE2 ?
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=ZZmc7.3239%24NW6.1508767%40news1.sttln1.wa.home.com&rnum=3&prev=/groups%3Fq%3Dwindows%2Bnt%2Bsse2%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3DZZmc7.3239%2524NW6.1508767%2540news1.sttln1.wa.home.com%26rnum%3D3 From: Jerry Coffin (jcoffin@taeus.com) Subject: Re: SSE under WinNT View this article only Newsgroups: comp.lang.asm.x86 Date: 2001-08-09 00:56:09 PST In article <ZZmc7.3239$NW6.1508767@news1.sttln1.wa.home.com>, ryan- 113@home.com says... [ ... ] > I ran into this awhile ago, and ended up deciding that NT 4 Service Pack 5 > did not support SSE instructions. I looked into it more today, and I ended > up deciding again that NT4 sp5 (or at least my installation) doesn't support > SSE. I'd be very happy to find out that I'm wrong, since it's a large issue > for me. I'd also like to know why the OS has to support a particular > instruction set. The operating system has to save/restore the CPU registers during context switches. There's a driver on the Intel web site to support SSE on NT 4.
What should the solution be? Require the driver or disable this for NT? I have a non-SSE2 1.7 build at http://www.pryan.org/mozilla/seamonkey/mmoy/Mozilla-1.7-Release-O1-G6-noSSE2.exe that can be used as a temporary workaround.
(In reply to comment #75) > What should the solution be? For now, filed bug 248509 with all the information > Require the driver or disable this for NT? For me the best solution would be to add SSE2 OS detection with MSVC build and disable SSE2 for other Windows compiler. Be careful of the __try/__except result on Win98 though.
I'll post further replies over there.
Depends on: 248509
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: