Closed Bug 125762 Opened 22 years ago Closed 20 years ago

Optimized inverse discrete cosine transform (iDCT) functions for libjpeg

Categories

(Core :: Graphics: ImageLib, defect, P4)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla1.1alpha

People

(Reporter: cathleennscp, Assigned: cathleennscp)

References

Details

(Keywords: perf, Whiteboard: [adt1])

Attachments

(3 files, 2 obsolete files)

 
Keywords: perf
hmmm what's this bug about? Any specifics? What about another bug ala "Faster 
mozilla"...:)
Yeah, this bug is kinda vague...  :-).

/be
I'd be very suprised to see a profile of mozilla where
libjpeg was a significant contributor.

Working with an extremely throttled proxy connection, I do
see that we don't seem to be generating paint events often
enough for smooth progressive loading.
taking.  more info to come shortly.
Assignee: cathleen → pavlov
Severity: normal → minor
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: --- → mozilla1.0
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Target Milestone: mozilla1.2 → mozilla1.0
Keywords: mozilla1.0, nsbeta1
per adt, not critical for nsbeta1. hence minus.
Keywords: nsbeta1nsbeta1-
we never got the info what this bug is about...
"faster jpeg decoder" is not very informative. Is this bug about some specific
imagelib code?
Please also file "faster imap code" bug...:)
this is critical for nsbeta1, see bugscape
http://bugscape.mcom.com/show_bug.cgi?id=12175
removing nsbeta1-

everyone, please be patient.  more info soon.
Keywords: nsbeta1-nsbeta1
pav/cathleen believe this is critical and almost ready. So a plus. 
Keywords: nsbeta1nsbeta1+
What's almost ready?  Is jpeg decoding performance such a critical issue that we
want to get rid of our tried-and-tested-for-a-decade IJG code for some new
thing?  If it didn't come out of IE, it feels like a lot of risk, and I haven't
seen a lot of people complaining about jpeg performance.  (Most of the images on
the web, especially out of the top100, are GIFs, right?)

Someone convince me that this is worth the review, super-review and integration
cycles, when we have a boatload of serious imagelib bugs (5 crashes, including a
topcrash+) assigned to pavlov and missing 0.9.9 as I type this.  I'm filled with
doubt, and find myself wishing, for the first time in my life, that I was pav's
manager. =)
calm down beavis.  this is a small patch from an external contributor to
libjpeg.  It isn't a all-new decoder.  This code isn't a huge win.  Raw JPEG
decoding (using djpeg) only showed about a 3% improvement on a 1.5GHz P4
machine.  I don't believe this is at all critical for 0.9.9 but is something
that would be nice to have for 1.0.
Summary: faster jpeg decoder → Optimized inverse discrete cosine transform (iDCT) functions for libjpeg
If it's just a small patch, then why is it and the surrounding discussion
living in bugscape?
because it has licensed stuff from the external contributor.  we're just waiting
on an ok from mitchell on the additional license stuff
shaver: be careful what you wish for :-) and as cathleen once said "everyone,
please be patient"
Let's see if I understand:  we're looking at 3% improvement in pure jpeg
decode on a relatively small fraction of deployed machines (P4s), which 
is likely dwarfed by the rest of mozilla's imaging pipeline.  

To get this the legal people are messing around with licensing issues?
Surely they have better things to spend their time on.  If Intel wants
to contribute their patch to libjpeg under the IJG terms that's one
thing, but let's not waste any more effort than is strictly necessary.
Attached patch P4 SSE2 JPEG optimization (obsolete) — Splinter Review
request for r/sr  :-)
Attachment #75677 - Attachment is obsolete: true
removed #if 0 from last patch.
it was added when we were trying to figure out compiling errs on machines with
no processor pack installed. 

request r/sr
correcting previous error on attach ID #

r=james.rose@intel.com
on attachment #75989 [details] [diff] [review]
James Rose - don't take this personally, but this is the first time I've seen
your name in bugzilla.  Could you briefly mention your qualifications for
reviewing this patch and disclose if you were involved in creating said
patch?  Thanks.
Jame Rose is the Intel developer we're working with.
I asked him to review the patch to make sure it is matching what they're expecting.
ok this is a quick non authorative review.

1. IANAL, mitchell: the mozilla.org policy as i understood it was that new files under non tri license had to be stuck in other-licenses. There's a file here that clearly doesn't fit that policy.

2. The check is for Win32, x86 and __m128i.  My guess is that __m128i indicates SSE2. but if that's the case then why the other checks.  If that's not the case then we need some questions answered:
can this be true for BC5.5, DMC, MSVC5.2, or OpenWatcom
I just looked up __m128i using google, and it appears to be the right check.  So I'm curious as to why you have the x86 check.  If you're concerned about endianess, you should check for that...

[At the very least, whatever you do really shouldn't break any compilers that currently work, (that's MSVC5.2, 6, 7) -- I doubt they would, but I'm just checking on behalf of some concerned parties...]

Assuming someone had Intel's C++ compiler for linux working, could it use this optimization (if not for the win32 ifdef)?

This is bad:
+#ifdef HAVE_SSE2_INTEL_MNEMONICS
+               if(SSE2Available == 1)
+               {
+                       method_ptr = jpeg_idct_islow_sse2;
+                       method = JDCT_ISLOW;
+               }
+               else
+               {
+                       method_ptr = jpeg_idct_islow;
+                       method = JDCT_ISLOW;
+               }
+#else
+               method_ptr = jpeg_idct_islow;
+               method = JDCT_ISLOW;
I'd suggest that we do this instead:

+#ifdef HAVE_SSE2_INTEL_MNEMONICS
+               if(SSE2Available == 1)
+                       method_ptr = jpeg_idct_islow_sse2;
+               else
+#endif
+                       method_ptr = jpeg_idct_islow;
+               method = JDCT_ISLOW;

Note that your (I'm using this generically, I have no idea who wrote the patch because that stuff is hidden in bugscape) patch results in 3 code paths, all of which set one field to the same value, and results in two identical code paths, which might need to be maintained.

the
 #ifdef DCT_IFAST_SUPPORTED
case should also be changed so we don't have to maintain three code paths.

--

If someone was trying to make the code in the new file line up, they failed. Could someone either strip whitespace two single spacing + indentation rules, or make everything almost line up? (I suspect there are tabs involved. tabs are taboo)

+* and are shifted to the left for rise of accuracy
'rise'?

+*    Dequantize 8x8 block of DCT coefficients
+*    Inverse DCT transform, de-quantization and level shift

i suppose dequantize is a word, and de-quantization is some accepted flavor, but I couldn't find it in m-w.com or dictionary.com, I'll go try to learn what it is this evening.
James Rose said in private email that he did the integration of intel jpeg
code into libjpg for this patch.  We don't allow people to review their own
patches - you'll need another reviewer.
True, James Rose contributed the code, but we generated the patches, so i
thought it would be a good idea to make sure the patch we generated is exactly
what they proposed.  (and he did catch an err earlier)

anyway, since i have tried Intel's code and tested, and even though I'll be
checking it into mozilla for Intel, I'll put my stamp on it.

r=cathleen on attachment #75989 [details] [diff] [review]
Attachment #75989 - Flags: review+
Keywords: mozilla1.0mozilla1.0-
Whiteboard: [adt3]
Comment on attachment 75989 [details] [diff] [review]
P4 SSE2 JPEG optimization

timeless' comments are correct, though one would hope that the compiler will
hoist the constant assignment; as always, I will gripe about post-fix
incrementing used when prefix is intended ... but these are minor nits not
enough to stop this from going in.  This code is reasonable.  The question
about whether it should be built in the mozilla case must take into
consideration the size cost (~ 2k? ... Cathleen, can you follow up with
numbers?) vs. the speed win and how often we get it.  Currently, I understand
that mozilla won't build this code path, since we would need the processor pack
to make the compiler do it.  Even if we had the processor pack, it looks like
we could configure this off in autoconf by forcing |HAVE_SSE2_INTEL_MNEMONICS|
off.  That means someday, drivers will have another decision to make about
whether to turn this on or not.  In the meantime, sr=scc
Attachment #75989 - Flags: superreview+
wrt mozilla actually building this path, we might. (I ran into this stuff again 
in some other venue, so i've been thinking about it a bit.) I think that 
systems that build with crypto might actually have the processor pack because 
it might be a requirement of nss.

this comment posted with nc4 instead of qnx voyager (i'm sorry about the 
earlier one not wrapping).
code size:
original - 74,240 b
w/ patch - 73,728 b (a bit smaller, possibly optimized by combo of service pack
5 and processor pack)

pageload on P4 2.2GHz
original - 296
w/ patch - 296

pageload on p2 200MHz
original - 2688
w/ patch - 2689


* SSE2 code will not get compiled if Processor Pack is not installed on
compiling machine.
* SSE2 code will not get executed on machines other than P4 with SSE2 chips
* JPEG decompression on large JPEG ( 1MB or larger ) is about 5-7% improvement.
although internal page-load test not showing difference.
Assignee: pavlov → cathleen
Status: ASSIGNED → NEW
Keywords: adt1.0.0
Whiteboard: [adt3] → [adt1]
This has been nsbeta1- with Gagan, and marked by Mozilla as 1.0-. Pls let us
know why we should be taking this at this time?
Marking adt1.0.0- on behalf of the ADT.  We're at a point where we can't take
the risk associated with this into Mach V.  We should get this into the trunk
after Mozilla branches for 1.0.
Keywords: adt1.0.0adt1.0.0-
patch landed on mozilla trunk on 4/29/02, for post 1.0 releases.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.0 → mozilla1.1alpha
Verified fix checked into lxr.mozilla.org
Status: RESOLVED → VERIFIED
Ok, apparently this new code was never turned on (which would explain the
result in comment 27) because defined(__m128i) isn't the right test.
Looking through the web it seems like the _M_IX86 test should be enough
to test for presence of the processor pack needed to compile the SSE2 code.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
*** Bug 236288 has been marked as a duplicate of this bug. ***
You can find builds and their descriptions at http://forums.mozillazine.org/viewtopic.php?t=54487

You can find performance tests and results showing at the bug that was just marked a duplicate (236288).

I have a build with the SSE2 JPEG stuff in it that runs and I will test it on
non-SSE2 builds later tonight. I'll also upload so that anyone that cares to can
try it out to ensure that it doesn't break other Windows processor/OS platforms.
Attached patch change check for sse2 (obsolete) — Splinter Review
This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and
processor pack, as specified on the mozilla.org build instructions) and
Visual Studio .net 2003.
Here's a build with SSE2 JPEG enabled that should run fine on any WinTel platform:

http://www.pryan.org/mozilla/firefox/mmoy/FireFoxB-Experimental-2004-03-03-D__m128i=1.exe

I've already tested it on one of my PIIIs. Test away on older machines if you have the time and the inclination.
(In reply to comment #35)
> Created an attachment (id=142867)
> change check for sse2
> 
> This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and
> processor pack, as specified on the mozilla.org build instructions) and
> Visual Studio .net 2003.

There's a comment in Bug 236288 where this code changes causes a problem with
SeaMonkey. This code change in SeaMonkey either crashes on startup or crashes
after going to one or two pages. So perhaps some a flag could be used so that
this only gets built for FireFox.

I'm working on a FireFox build right now but will do a debug SeaMonkey build
tomorrow to see if I can find the problem over there. Of course if someone
else is planning on doing this, send me an email and I won't bother.
Crashing in the jpeg code is likely due to bug 137478 - swalker reports
that with the latest patch from there things seem to be working fine with
either O1 or O2.
(In reply to comment #38)
> Crashing in the jpeg code is likely due to bug 137478 - swalker reports
> that with the latest patch from there things seem to be working fine with
> either O1 or O2.

I did three SeaMonkey builds with the 2003-10-24 05:46 PST jdapimin.c patch and they all failed. I've done a number of builds with FireFox with the same patch that had no problems.

I have a FireFox build running right now with the older jdapimin.c patch in it.
I'll fire up a SeaMonkey debug build with the new patch from today after the
FireFox build finishes so we can find out if the crash goes away or not. If the crash is still there, we should have a stack trace to look at.

The crash in SeaMonkey was with GL-G7-O2-arch:SSE2 optimization which is about as aggressive as you can get. I'll probably take out GL as this may not work well with debug.

I got the feeling that the bug was fixed with the older patch and that today's patch was more of a better way of doing it rather than a change to fix a crash problem with the older patch. Again, let me know if you disagree or plan to do the build and test.
Well, the builds didn't go so well last night. See the FireFox build forums
if you're interested in the gory details. I also blew away my build environment
by accident but had a backup which I restored.

The SeaMonkey build is running right now off of a stable tarball that I pulled
down a few days ago. The debug option apparantly forces O2 optimization to O1
so I'll have to do another build without debug to test O2.
The debug build dies in dbgheap with an assertion. Continueing generates another
assertion. I didn't see anything obvious in the stack trace (first time I've used
this version of Visual Studio for debugging).

I'm going to take out debug and all optimizations and just turn on __m128i to see
if this works. If it does, I'll add back things until it breaks again.

I'll save the debug build as it might be useful later on. If anyone wants to play
with it, let me know and I'll put it up tonight.
Taking out debug and the optimizations gave me the No Bindings for XBL objects
or something similar to that. I'm wondering if the problem is with the SSE2
test code. The patch that you referred to does testing for MMX capability and
the previous patch worked fine. Now that we're enabling SSE2, I'm wondering
if it's a problem in the sse2support routine at the bottom of jdapimin.c.

I changed the code to just set it to 1 instead of calling sse2support to
eliminate that mixed C and assembler code as a cause of the problem and the
build is off and running again.
Finally got a SeaMonkey build to work. Optimization was arch:SSE2, O2 and G7. I
left GL out to save build time. I modified the routines at the bottom of
jdapimin.c to return 1 and this worked.

Next up is a build with the latest jdapimin.c patch (MMX) with the SSE2 routine
just returning a 1. If this works, then I'll put the SSE2 code back in which
should either show or refute a problem with that code.
I put the mmx code back into jdapimin.c with the latest patch and the build worked. Note that the sse2 code just returns a 1 (sse2 available).

I just started a build with the sse2 code added back in. If this one crashes,
then that's a pretty good indication that the problem is in the sse2support
routine. This build has the -FAs switch to generate a listing of the assembler
code which may make spotting the problem easy. Here's the code from jdapimin.c
for reference.

int sse2support()
{
	int sse2available = 0;
	int my_edx;
	_asm
	{
		mov eax, 01                       
		cpuid                                    
		mov my_edx, edx    
	}
	if (my_edx & (0x1 << 26)) 
		sse2available = 1; 
	else sse2available = 2;

	return sse2available;
}
; 479  : {

	push	ecx
	push	ebx

; 480  : 	int sse2available = 0;
; 481  : 	int my_edx;
; 482  : 	_asm
; 483  : 	{
; 484  : 		mov eax, 01                       

	mov	eax, 1

; 485  : 		cpuid                                    

	cpuid

; 486  : 		mov my_edx, edx    

	mov	DWORD PTR _my_edx$[esp+8], edx

; 487  : 	}
; 488  : 	if (my_edx & (0x1 << 26)) 

	mov	eax, DWORD PTR _my_edx$[esp+8]
	and	eax, 67108864				; 04000000H
	neg	eax
	sbb	eax, eax
	add	eax, 2
	pop	ebx

; 489  : 		sse2available = 1; 
; 490  : 	else sse2available = 2;
; 491  : 
; 492  : 	return sse2available;
; 493  : }

	pop	ecx
	ret	0
Added back in the sse2available code and the build worked and the browser worked.
Then added GL optimization back in and it crashed. So the problem is seen only
with GL optimization (from my perspective). I changed sse2available back to just
returning 1 and am building now.
Stop us from crashing on 386 and older 486 processors, which didn't
have the cpuid instruction.
Attachment #142867 - Attachment is obsolete: true
Comment on attachment 142994 [details] [diff] [review]
same + prevent using cpuid on older machines

r=me. But if we have problems with the new sse3 code and noone steps up really
fast to fix it lets just remove it.
Attachment #142994 - Flags: review+
i mean sse2 of course. All sse3 could should be immideatly disabled since such
instructions arn't supported by any cpu, or even invented :)
Attachment #142994 - Flags: superreview?(bryner)
Attachment #142994 - Flags: superreview?(bryner) → superreview+
Checked in.
Status: REOPENED → RESOLVED
Closed: 22 years ago20 years ago
Resolution: --- → FIXED
I get a compile error on windows ME, with VC++ 6.0

fatal error C1600: unsupported data type
make[1]: *** [jidctint.obj] Error 2

If I backout the change to jmorecfg.h, it compiles and links.
Do you have the service and processor packs that are in the build requirements?

  http://www.mozilla.org/build/win32.html#ss2.2
alex, tor:  I got this same build problem and getting the latest service and
processor packs fixed it for me.

I posted something to n.p.m.builds
(news://news.mozilla.org:119/404BA31C.1060905@mozilla.org) in case someone else
hits the same problem and googles for it.

I only have the standard edition, so I'll just stick with my patched tree to get 
around requiring an assembler. (Just not building crypto used to be enough :-/) 
(In reply to comment #53)
> alex, tor:  I got this same build problem and getting the latest service and
> processor packs fixed it for me.
> 

And what about those of us who're compiling with Cygwin, eh?  Yep, same damn
build problem :-)
(In reply to comment #55)
> (In reply to comment #53)
> > alex, tor:  I got this same build problem and getting the latest service and
> > processor packs fixed it for me.
> > 
> 
> And what about those of us who're compiling with Cygwin, eh?  Yep, same damn
> build problem :-)

We don't hit this when compiling with mingw gcc.  However, I think that may be a
fluke.  The standalone mingw compiler doesn't define _M_IX86.  It expects to get
that define from <windows.h>.  I'm guessing the cygwin gcc defines _M_IX86 by
default.  Besides the __declspec(align) issue, gcc will barf on the inlined
intel asm.  gcc uses the at&t syntax.
Attachment #144153 - Flags: superreview?(tor)
Attachment #144153 - Flags: approval1.7b?
Attachment #144153 - Flags: superreview?(tor) → superreview+
Attachment #144153 - Flags: approval1.7b? → approval1.7?
Comment on attachment 144153 [details] [diff] [review]
No mmx/sse2 for win32 gcc

a=asa (on behalf of drivers) for checkin to 1.7
Attachment #144153 - Flags: approval1.7? → approval1.7+
What does GCC have as far as SSE2 support goes? Does it have intrinsics? If so, it should be pretty easy to port it over.
Just curious, since Mr. Dotzler gave a= for this patch:

Will this change cause a visible increase in JPEG rendering speed on P4/SSE2
architectures? I have a P4-M laptop with a P4/SSE2-optimized 20040325 build and
can perform some rough benchmarks if needed.
In the bug that I found, I reported my performance testing results and found that the SSE2 code resulted in 30% less time in rendering JPEG images. I used three images that totalled about 11 MB with nothing else on the page and everything was loaded from RamDisk (Program, html and image files) except for the Profile.

The SSE2 code should be in any MSVC++ build after the code went in. You don't need an optimized build to get the SSE2 JPEG benefit.

What would be nice is if the code were ported to GCC and Linux and even nicer if it was ported from an Integer implementation to a Floating Point implementation as I think the latter would be more efficient.
I did some major surgery on the JPEG SSE2 code and brought the codepath down by
about 50%. The code path before was 600 instructions and it's now 316. I didn't
include the call/return overhead for the five procedures in my calculations so
the 600 number is low. There is no call/return overhead in my code as it's all
in one routine now. And I've made a lot of effort to reduce memory access.

I did three public builds along the way and users were pretty happy with the
results. Though I've made quite a few improvements since my last released build.

Question: how do I go about getting the code into Mozilla? I have patch files to
implement the change with documentation as to what I did to jidctint.c.
Michael, you should open another bug and submit your patch for review to get the
code into Mozilla.

About this bug, I really dislike the way the way the patch was implemented.

1) Why are those optimizations only done for Wintel platforms, rather than Intel
platforms ? There are other platforms that run on intel chips with SSE2, ie.
Linux, OS/2, Solaris x86 . Should they not benefit too ?

2) Why is the assembly code inline in C files ? Several C compilers on different
x86 platforms cannot process inline assembly. The assembly code should be in
separate .s or .asm files , not in C files .
(In reply to comment #63)
> Michael, you should open another bug and submit your patch for review to get the
> code into Mozilla.

I'll try to figure out how to open a bug to do that.

> About this bug, I really dislike the way the way the patch was implemented.
> 
> 1) Why are those optimizations only done for Wintel platforms, rather than Intel
> platforms ? There are other platforms that run on intel chips with SSE2, ie.
> Linux, OS/2, Solaris x86 . Should they not benefit too ?

Well, I don't personally have machines that run those operating systems.

> 2) Why is the assembly code inline in C files ? Several C compilers on different
> x86 platforms cannot process inline assembly. The assembly code should be in
> separate .s or .asm files , not in C files .

MSVC++ makes it really easy to do Intel extensions in that they support inline
assembly and intrinsics. Some of the nice things about inline is that you can
use your C variable in your assembler code and that it takes care of the saving
and restoring of registers in and out of your routines.

If you want, you can write a little C then a little Assembler, then a little C
and so on. If you're porting a routine, it's very nice in that you can convert
a little of the code at a time inline.

Is it the best way to do portable code? Only if you go by Bill's definition of
portable. But if I had the hardware and software, I imagine a port would be as
hard as doing a build.

I also have a port to SSE for Pentium about 40% done and have done a little work
on an MMX port.
> Well, I don't personally have machines that run those operating systems.

I don't know whether you care about propagating this code further than
Mozilla-on-Windows, but be notified: your chances of getting code accepted into
upstream libjpeg with the above approach are nil.  We did not sweat bullets to
create a portable library only so that cowboys with zero interest in portability
could slap random patches on top.
(In reply to comment #65)
> > Well, I don't personally have machines that run those operating systems.
> 
> I don't know whether you care about propagating this code further than
> Mozilla-on-Windows, but be notified: your chances of getting code accepted into
> upstream libjpeg with the above approach are nil.  We did not sweat bullets to
> create a portable library only so that cowboys with zero interest in portability
> could slap random patches on top.

Send me some hardware and I'll be happy to port it.
I figured out how to port this to GCC, at least for Linux. For those that are
wondering why no one has ported to GCC in the past: Microsoft Visual C++ Inline
Assembly support is much easier to use than what's in GCC. Another thing is that
it's relatively easy to find examples of MSVC++ Inline Assembly and MSVC++
Inline Assembly with MMX/SSE/SSE2 instructions. GCC Inline Assembly and GCC
SIMD Inline Assembly examples are few and far between on Newsgroups and on the
WWW in general. Though there's one more working example as of this morning.

I just need to line up someone that can build and test on P4 Linux. There are a
few Linux Unofficial builders on MozillaZine and perhaps one of them will
volunteer.

My P3 SSE port is done as well and there are lots of people using it in the form
of unofficial builds.

And I plan a port to Intrinsics for Windows 64 as Microsoft's development tools
don't support Inline Assembly for that platform. I just need to get my hands on
an Athlon 64 system.
I'm looking for a volunteer with a GCC Pentium 4 or Pentium 3 build environment
on Linux to work with me on the port of the SSE and SSE2 code. This will be an
iterative project with me porting and someone building and testing. At the moment,
I have the CPUID stuff done and am looking for a sanity check on that.

I've posted in the Unofficial Builders forum but don't have any responses yet.
This is an opportunity for those that want the SSE/SSE2 code working on Linux to
step up to the plate and help to get it done.
It may have caused major regression bug 247437 (see attached screenshot), the
other offender would be bug 137478
There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) ,
starting on 2004031615 (72 records from talkback), this is currently topcrash #5
for Mozilla 1.7.

URL in the comments are:
http://www.anandtech.com
http://www.diezeit.de
http://www.linux.org/
http://strasbourg.eauxvives.free.fr/phpBB2

I can't reproduce because I don't have SSE2 but it would be worth a try ...
Those sites all open fine for me. Could you send me the assembler before and
after the crash point? Also, is hardware information available?

(In reply to comment #70)
> There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) ,
> starting on 2004031615 (72 records from talkback), this is currently topcrash #5
> for Mozilla 1.7.
> 
> URL in the comments are:
> http://www.anandtech.com
> http://www.diezeit.de
> http://www.linux.org/
> http://strasbourg.eauxvives.free.fr/phpBB2
> 
> I can't reproduce because I don't have SSE2 but it would be worth a try ...
> 
Have there been any reports on FireFox?
Yes, firefox 0.9 2004061423

All crashes seems to occur on [Windows NT 4.0 build 1381], trigger reason is
"Illegal instruction", at
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/jpeg/jidctint.c&mark=716#716

It looks like the CPU supports SSE2, but NT4 doesn't like it, maybe NT4 needs a
special driver or service pack to enable SSE2 ?

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=ZZmc7.3239%24NW6.1508767%40news1.sttln1.wa.home.com&rnum=3&prev=/groups%3Fq%3Dwindows%2Bnt%2Bsse2%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3DZZmc7.3239%2524NW6.1508767%2540news1.sttln1.wa.home.com%26rnum%3D3

From: Jerry Coffin (jcoffin@taeus.com)
Subject: Re: SSE under WinNT
 
View this article only
Newsgroups: comp.lang.asm.x86
Date: 2001-08-09 00:56:09 PST

In article <ZZmc7.3239$NW6.1508767@news1.sttln1.wa.home.com>, ryan-
113@home.com says...

[ ... ] 

> I ran into this awhile ago, and ended up deciding that NT 4 Service Pack 5
> did not support SSE instructions.  I looked into it more today, and I ended
> up deciding again that NT4 sp5 (or at least my installation) doesn't support
> SSE.  I'd be very happy to find out that I'm wrong, since it's a large issue
> for me.  I'd also like to know why the OS has to support a particular
> instruction set.

The operating system has to save/restore the CPU registers during 
context switches.  There's a driver on the Intel web site to support 
SSE on NT 4.
What should the solution be? Require the driver or disable this for NT?

I have a non-SSE2 1.7 build at
http://www.pryan.org/mozilla/seamonkey/mmoy/Mozilla-1.7-Release-O1-G6-noSSE2.exe
that can be used as a temporary workaround.
(In reply to comment #75)
> What should the solution be? 
For now, filed bug 248509 with all the information

> Require the driver or disable this for NT?
For me the best solution would be to add SSE2 OS detection with MSVC build and
disable SSE2 for other Windows compiler. Be careful of the __try/__except result
on Win98 though.
I'll post further replies over there.
Depends on: 248509
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: