Closed
Bug 125762
Opened 23 years ago
Closed 21 years ago
Optimized inverse discrete cosine transform (iDCT) functions for libjpeg
Categories
(Core :: Graphics: ImageLib, defect, P4)
Tracking
()
RESOLVED
FIXED
mozilla1.1alpha
People
(Reporter: cathleennscp, Assigned: cathleennscp)
References
Details
(Keywords: perf, Whiteboard: [adt1])
Attachments
(3 files, 2 obsolete files)
21.94 KB,
patch
|
cathleennscp
:
review+
scc
:
superreview+
|
Details | Diff | Splinter Review |
1.31 KB,
patch
|
sicking
:
review+
bryner
:
superreview+
|
Details | Diff | Splinter Review |
533 bytes,
patch
|
tor
:
superreview+
asa
:
approval1.7+
|
Details | Diff | Splinter Review |
Comment 1•23 years ago
|
||
hmmm what's this bug about? Any specifics? What about another bug ala "Faster
mozilla"...:)
Comment 2•23 years ago
|
||
Yeah, this bug is kinda vague... :-).
/be
I'd be very suprised to see a profile of mozilla where
libjpeg was a significant contributor.
Working with an extremely throttled proxy connection, I do
see that we don't seem to be generating paint events often
enough for smooth progressive loading.
Updated•23 years ago
|
Severity: normal → minor
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: --- → mozilla1.0
Comment 5•23 years ago
|
||
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword. Please send any
questions or feedback about this to adt@netscape.com. You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Updated•23 years ago
|
Target Milestone: mozilla1.2 → mozilla1.0
Updated•23 years ago
|
Keywords: mozilla1.0,
nsbeta1
per adt, not critical for nsbeta1. hence minus.
Comment 7•23 years ago
|
||
we never got the info what this bug is about...
"faster jpeg decoder" is not very informative. Is this bug about some specific
imagelib code?
Please also file "faster imap code" bug...:)
this is critical for nsbeta1, see bugscape
http://bugscape.mcom.com/show_bug.cgi?id=12175
removing nsbeta1-
everyone, please be patient. more info soon.
pav/cathleen believe this is critical and almost ready. So a plus.
Comment 10•23 years ago
|
||
What's almost ready? Is jpeg decoding performance such a critical issue that we
want to get rid of our tried-and-tested-for-a-decade IJG code for some new
thing? If it didn't come out of IE, it feels like a lot of risk, and I haven't
seen a lot of people complaining about jpeg performance. (Most of the images on
the web, especially out of the top100, are GIFs, right?)
Someone convince me that this is worth the review, super-review and integration
cycles, when we have a boatload of serious imagelib bugs (5 crashes, including a
topcrash+) assigned to pavlov and missing 0.9.9 as I type this. I'm filled with
doubt, and find myself wishing, for the first time in my life, that I was pav's
manager. =)
Comment 11•23 years ago
|
||
calm down beavis. this is a small patch from an external contributor to
libjpeg. It isn't a all-new decoder. This code isn't a huge win. Raw JPEG
decoding (using djpeg) only showed about a 3% improvement on a 1.5GHz P4
machine. I don't believe this is at all critical for 0.9.9 but is something
that would be nice to have for 1.0.
Summary: faster jpeg decoder → Optimized inverse discrete cosine transform (iDCT) functions for libjpeg
Comment 12•23 years ago
|
||
If it's just a small patch, then why is it and the surrounding discussion
living in bugscape?
Comment 13•23 years ago
|
||
because it has licensed stuff from the external contributor. we're just waiting
on an ok from mitchell on the additional license stuff
Comment 14•23 years ago
|
||
shaver: be careful what you wish for :-) and as cathleen once said "everyone,
please be patient"
Comment 15•23 years ago
|
||
Let's see if I understand: we're looking at 3% improvement in pure jpeg
decode on a relatively small fraction of deployed machines (P4s), which
is likely dwarfed by the rest of mozilla's imaging pipeline.
To get this the legal people are messing around with licensing issues?
Surely they have better things to spend their time on. If Intel wants
to contribute their patch to libjpeg under the IJG terms that's one
thing, but let's not waste any more effort than is strictly necessary.
Assignee | ||
Comment 16•23 years ago
|
||
request for r/sr :-)
Attachment #75677 -
Attachment is obsolete: true
Assignee | ||
Comment 17•23 years ago
|
||
removed #if 0 from last patch.
it was added when we were trying to figure out compiling errs on machines with
no processor pack installed.
request r/sr
Comment 18•23 years ago
|
||
Comment 19•23 years ago
|
||
correcting previous error on attach ID #
r=james.rose@intel.com
on attachment #75989 [details] [diff] [review]
Comment 20•23 years ago
|
||
James Rose - don't take this personally, but this is the first time I've seen
your name in bugzilla. Could you briefly mention your qualifications for
reviewing this patch and disclose if you were involved in creating said
patch? Thanks.
Assignee | ||
Comment 21•23 years ago
|
||
Jame Rose is the Intel developer we're working with.
I asked him to review the patch to make sure it is matching what they're expecting.
Comment 22•23 years ago
|
||
ok this is a quick non authorative review.
1. IANAL, mitchell: the mozilla.org policy as i understood it was that new files under non tri license had to be stuck in other-licenses. There's a file here that clearly doesn't fit that policy.
2. The check is for Win32, x86 and __m128i. My guess is that __m128i indicates SSE2. but if that's the case then why the other checks. If that's not the case then we need some questions answered:
can this be true for BC5.5, DMC, MSVC5.2, or OpenWatcom
I just looked up __m128i using google, and it appears to be the right check. So I'm curious as to why you have the x86 check. If you're concerned about endianess, you should check for that...
[At the very least, whatever you do really shouldn't break any compilers that currently work, (that's MSVC5.2, 6, 7) -- I doubt they would, but I'm just checking on behalf of some concerned parties...]
Assuming someone had Intel's C++ compiler for linux working, could it use this optimization (if not for the win32 ifdef)?
This is bad:
+#ifdef HAVE_SSE2_INTEL_MNEMONICS
+ if(SSE2Available == 1)
+ {
+ method_ptr = jpeg_idct_islow_sse2;
+ method = JDCT_ISLOW;
+ }
+ else
+ {
+ method_ptr = jpeg_idct_islow;
+ method = JDCT_ISLOW;
+ }
+#else
+ method_ptr = jpeg_idct_islow;
+ method = JDCT_ISLOW;
I'd suggest that we do this instead:
+#ifdef HAVE_SSE2_INTEL_MNEMONICS
+ if(SSE2Available == 1)
+ method_ptr = jpeg_idct_islow_sse2;
+ else
+#endif
+ method_ptr = jpeg_idct_islow;
+ method = JDCT_ISLOW;
Note that your (I'm using this generically, I have no idea who wrote the patch because that stuff is hidden in bugscape) patch results in 3 code paths, all of which set one field to the same value, and results in two identical code paths, which might need to be maintained.
the
#ifdef DCT_IFAST_SUPPORTED
case should also be changed so we don't have to maintain three code paths.
--
If someone was trying to make the code in the new file line up, they failed. Could someone either strip whitespace two single spacing + indentation rules, or make everything almost line up? (I suspect there are tabs involved. tabs are taboo)
+* and are shifted to the left for rise of accuracy
'rise'?
+* Dequantize 8x8 block of DCT coefficients
+* Inverse DCT transform, de-quantization and level shift
i suppose dequantize is a word, and de-quantization is some accepted flavor, but I couldn't find it in m-w.com or dictionary.com, I'll go try to learn what it is this evening.
Comment 23•23 years ago
|
||
James Rose said in private email that he did the integration of intel jpeg
code into libjpg for this patch. We don't allow people to review their own
patches - you'll need another reviewer.
Assignee | ||
Comment 24•23 years ago
|
||
True, James Rose contributed the code, but we generated the patches, so i
thought it would be a good idea to make sure the patch we generated is exactly
what they proposed. (and he did catch an err earlier)
anyway, since i have tried Intel's code and tested, and even though I'll be
checking it into mozilla for Intel, I'll put my stamp on it.
r=cathleen on attachment #75989 [details] [diff] [review]
Attachment #75989 -
Flags: review+
Keywords: mozilla1.0 → mozilla1.0-
Comment 25•23 years ago
|
||
Comment on attachment 75989 [details] [diff] [review]
P4 SSE2 JPEG optimization
timeless' comments are correct, though one would hope that the compiler will
hoist the constant assignment; as always, I will gripe about post-fix
incrementing used when prefix is intended ... but these are minor nits not
enough to stop this from going in. This code is reasonable. The question
about whether it should be built in the mozilla case must take into
consideration the size cost (~ 2k? ... Cathleen, can you follow up with
numbers?) vs. the speed win and how often we get it. Currently, I understand
that mozilla won't build this code path, since we would need the processor pack
to make the compiler do it. Even if we had the processor pack, it looks like
we could configure this off in autoconf by forcing |HAVE_SSE2_INTEL_MNEMONICS|
off. That means someday, drivers will have another decision to make about
whether to turn this on or not. In the meantime, sr=scc
Attachment #75989 -
Flags: superreview+
Comment 26•23 years ago
|
||
wrt mozilla actually building this path, we might. (I ran into this stuff again
in some other venue, so i've been thinking about it a bit.) I think that
systems that build with crypto might actually have the processor pack because
it might be a requirement of nss.
this comment posted with nc4 instead of qnx voyager (i'm sorry about the
earlier one not wrapping).
Assignee | ||
Comment 27•23 years ago
|
||
code size:
original - 74,240 b
w/ patch - 73,728 b (a bit smaller, possibly optimized by combo of service pack
5 and processor pack)
pageload on P4 2.2GHz
original - 296
w/ patch - 296
pageload on p2 200MHz
original - 2688
w/ patch - 2689
* SSE2 code will not get compiled if Processor Pack is not installed on
compiling machine.
* SSE2 code will not get executed on machines other than P4 with SSE2 chips
* JPEG decompression on large JPEG ( 1MB or larger ) is about 5-7% improvement.
although internal page-load test not showing difference.
Assignee: pavlov → cathleen
Status: ASSIGNED → NEW
Comment 28•23 years ago
|
||
This has been nsbeta1- with Gagan, and marked by Mozilla as 1.0-. Pls let us
know why we should be taking this at this time?
Comment 29•23 years ago
|
||
Marking adt1.0.0- on behalf of the ADT. We're at a point where we can't take
the risk associated with this into Mach V. We should get this into the trunk
after Mozilla branches for 1.0.
Assignee | ||
Comment 30•23 years ago
|
||
patch landed on mozilla trunk on 4/29/02, for post 1.0 releases.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.0 → mozilla1.1alpha
Comment 32•21 years ago
|
||
Ok, apparently this new code was never turned on (which would explain the
result in comment 27) because defined(__m128i) isn't the right test.
Looking through the web it seems like the _M_IX86 test should be enough
to test for presence of the processor pack needed to compile the SSE2 code.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 33•21 years ago
|
||
*** Bug 236288 has been marked as a duplicate of this bug. ***
Comment 34•21 years ago
|
||
You can find builds and their descriptions at http://forums.mozillazine.org/viewtopic.php?t=54487
You can find performance tests and results showing at the bug that was just marked a duplicate (236288).
I have a build with the SSE2 JPEG stuff in it that runs and I will test it on
non-SSE2 builds later tonight. I'll also upload so that anyone that cares to can
try it out to ensure that it doesn't break other Windows processor/OS platforms.
Comment 35•21 years ago
|
||
This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and
processor pack, as specified on the mozilla.org build instructions) and
Visual Studio .net 2003.
Comment 36•21 years ago
|
||
Here's a build with SSE2 JPEG enabled that should run fine on any WinTel platform:
http://www.pryan.org/mozilla/firefox/mmoy/FireFoxB-Experimental-2004-03-03-D__m128i=1.exe
I've already tested it on one of my PIIIs. Test away on older machines if you have the time and the inclination.
Comment 37•21 years ago
|
||
(In reply to comment #35)
> Created an attachment (id=142867)
> change check for sse2
>
> This causes SSE2 support to be compiled on Visual Studio 6 (with SP5 and
> processor pack, as specified on the mozilla.org build instructions) and
> Visual Studio .net 2003.
There's a comment in Bug 236288 where this code changes causes a problem with
SeaMonkey. This code change in SeaMonkey either crashes on startup or crashes
after going to one or two pages. So perhaps some a flag could be used so that
this only gets built for FireFox.
I'm working on a FireFox build right now but will do a debug SeaMonkey build
tomorrow to see if I can find the problem over there. Of course if someone
else is planning on doing this, send me an email and I won't bother.
Comment 38•21 years ago
|
||
Crashing in the jpeg code is likely due to bug 137478 - swalker reports
that with the latest patch from there things seem to be working fine with
either O1 or O2.
Comment 39•21 years ago
|
||
(In reply to comment #38)
> Crashing in the jpeg code is likely due to bug 137478 - swalker reports
> that with the latest patch from there things seem to be working fine with
> either O1 or O2.
I did three SeaMonkey builds with the 2003-10-24 05:46 PST jdapimin.c patch and they all failed. I've done a number of builds with FireFox with the same patch that had no problems.
I have a FireFox build running right now with the older jdapimin.c patch in it.
I'll fire up a SeaMonkey debug build with the new patch from today after the
FireFox build finishes so we can find out if the crash goes away or not. If the crash is still there, we should have a stack trace to look at.
The crash in SeaMonkey was with GL-G7-O2-arch:SSE2 optimization which is about as aggressive as you can get. I'll probably take out GL as this may not work well with debug.
I got the feeling that the bug was fixed with the older patch and that today's patch was more of a better way of doing it rather than a change to fix a crash problem with the older patch. Again, let me know if you disagree or plan to do the build and test.
Comment 40•21 years ago
|
||
Well, the builds didn't go so well last night. See the FireFox build forums
if you're interested in the gory details. I also blew away my build environment
by accident but had a backup which I restored.
The SeaMonkey build is running right now off of a stable tarball that I pulled
down a few days ago. The debug option apparantly forces O2 optimization to O1
so I'll have to do another build without debug to test O2.
Comment 41•21 years ago
|
||
The debug build dies in dbgheap with an assertion. Continueing generates another
assertion. I didn't see anything obvious in the stack trace (first time I've used
this version of Visual Studio for debugging).
I'm going to take out debug and all optimizations and just turn on __m128i to see
if this works. If it does, I'll add back things until it breaks again.
I'll save the debug build as it might be useful later on. If anyone wants to play
with it, let me know and I'll put it up tonight.
Comment 42•21 years ago
|
||
Taking out debug and the optimizations gave me the No Bindings for XBL objects
or something similar to that. I'm wondering if the problem is with the SSE2
test code. The patch that you referred to does testing for MMX capability and
the previous patch worked fine. Now that we're enabling SSE2, I'm wondering
if it's a problem in the sse2support routine at the bottom of jdapimin.c.
I changed the code to just set it to 1 instead of calling sse2support to
eliminate that mixed C and assembler code as a cause of the problem and the
build is off and running again.
Comment 43•21 years ago
|
||
Finally got a SeaMonkey build to work. Optimization was arch:SSE2, O2 and G7. I
left GL out to save build time. I modified the routines at the bottom of
jdapimin.c to return 1 and this worked.
Next up is a build with the latest jdapimin.c patch (MMX) with the SSE2 routine
just returning a 1. If this works, then I'll put the SSE2 code back in which
should either show or refute a problem with that code.
Comment 44•21 years ago
|
||
I put the mmx code back into jdapimin.c with the latest patch and the build worked. Note that the sse2 code just returns a 1 (sse2 available).
I just started a build with the sse2 code added back in. If this one crashes,
then that's a pretty good indication that the problem is in the sse2support
routine. This build has the -FAs switch to generate a listing of the assembler
code which may make spotting the problem easy. Here's the code from jdapimin.c
for reference.
int sse2support()
{
int sse2available = 0;
int my_edx;
_asm
{
mov eax, 01
cpuid
mov my_edx, edx
}
if (my_edx & (0x1 << 26))
sse2available = 1;
else sse2available = 2;
return sse2available;
}
Comment 45•21 years ago
|
||
; 479 : {
push ecx
push ebx
; 480 : int sse2available = 0;
; 481 : int my_edx;
; 482 : _asm
; 483 : {
; 484 : mov eax, 01
mov eax, 1
; 485 : cpuid
cpuid
; 486 : mov my_edx, edx
mov DWORD PTR _my_edx$[esp+8], edx
; 487 : }
; 488 : if (my_edx & (0x1 << 26))
mov eax, DWORD PTR _my_edx$[esp+8]
and eax, 67108864 ; 04000000H
neg eax
sbb eax, eax
add eax, 2
pop ebx
; 489 : sse2available = 1;
; 490 : else sse2available = 2;
; 491 :
; 492 : return sse2available;
; 493 : }
pop ecx
ret 0
Comment 46•21 years ago
|
||
Added back in the sse2available code and the build worked and the browser worked.
Then added GL optimization back in and it crashed. So the problem is seen only
with GL optimization (from my perspective). I changed sse2available back to just
returning 1 and am building now.
Comment 47•21 years ago
|
||
Stop us from crashing on 386 and older 486 processors, which didn't
have the cpuid instruction.
Attachment #142867 -
Attachment is obsolete: true
Comment on attachment 142994 [details] [diff] [review]
same + prevent using cpuid on older machines
r=me. But if we have problems with the new sse3 code and noone steps up really
fast to fix it lets just remove it.
Attachment #142994 -
Flags: review+
i mean sse2 of course. All sse3 could should be immideatly disabled since such
instructions arn't supported by any cpu, or even invented :)
Attachment #142994 -
Flags: superreview?(bryner)
Updated•21 years ago
|
Attachment #142994 -
Flags: superreview?(bryner) → superreview+
Comment 50•21 years ago
|
||
Checked in.
Status: REOPENED → RESOLVED
Closed: 23 years ago → 21 years ago
Resolution: --- → FIXED
Comment 51•21 years ago
|
||
I get a compile error on windows ME, with VC++ 6.0
fatal error C1600: unsupported data type
make[1]: *** [jidctint.obj] Error 2
If I backout the change to jmorecfg.h, it compiles and links.
Comment 52•21 years ago
|
||
Do you have the service and processor packs that are in the build requirements?
http://www.mozilla.org/build/win32.html#ss2.2
Comment 53•21 years ago
|
||
alex, tor: I got this same build problem and getting the latest service and
processor packs fixed it for me.
I posted something to n.p.m.builds
(news://news.mozilla.org:119/404BA31C.1060905@mozilla.org) in case someone else
hits the same problem and googles for it.
Comment 54•21 years ago
|
||
I only have the standard edition, so I'll just stick with my patched tree to get
around requiring an assembler. (Just not building crypto used to be enough :-/)
Comment 55•21 years ago
|
||
(In reply to comment #53)
> alex, tor: I got this same build problem and getting the latest service and
> processor packs fixed it for me.
>
And what about those of us who're compiling with Cygwin, eh? Yep, same damn
build problem :-)
Comment 56•21 years ago
|
||
(In reply to comment #55)
> (In reply to comment #53)
> > alex, tor: I got this same build problem and getting the latest service and
> > processor packs fixed it for me.
> >
>
> And what about those of us who're compiling with Cygwin, eh? Yep, same damn
> build problem :-)
We don't hit this when compiling with mingw gcc. However, I think that may be a
fluke. The standalone mingw compiler doesn't define _M_IX86. It expects to get
that define from <windows.h>. I'm guessing the cygwin gcc defines _M_IX86 by
default. Besides the __declspec(align) issue, gcc will barf on the inlined
intel asm. gcc uses the at&t syntax.
Comment 57•21 years ago
|
||
Attachment #144153 -
Flags: superreview?(tor)
Attachment #144153 -
Flags: approval1.7b?
Attachment #144153 -
Flags: superreview?(tor) → superreview+
Updated•21 years ago
|
Attachment #144153 -
Flags: approval1.7b? → approval1.7?
Comment 58•21 years ago
|
||
Comment on attachment 144153 [details] [diff] [review]
No mmx/sse2 for win32 gcc
a=asa (on behalf of drivers) for checkin to 1.7
Attachment #144153 -
Flags: approval1.7? → approval1.7+
Comment 59•21 years ago
|
||
What does GCC have as far as SSE2 support goes? Does it have intrinsics? If so, it should be pretty easy to port it over.
Just curious, since Mr. Dotzler gave a= for this patch:
Will this change cause a visible increase in JPEG rendering speed on P4/SSE2
architectures? I have a P4-M laptop with a P4/SSE2-optimized 20040325 build and
can perform some rough benchmarks if needed.
Comment 61•21 years ago
|
||
In the bug that I found, I reported my performance testing results and found that the SSE2 code resulted in 30% less time in rendering JPEG images. I used three images that totalled about 11 MB with nothing else on the page and everything was loaded from RamDisk (Program, html and image files) except for the Profile.
The SSE2 code should be in any MSVC++ build after the code went in. You don't need an optimized build to get the SSE2 JPEG benefit.
What would be nice is if the code were ported to GCC and Linux and even nicer if it was ported from an Integer implementation to a Floating Point implementation as I think the latter would be more efficient.
Comment 62•21 years ago
|
||
I did some major surgery on the JPEG SSE2 code and brought the codepath down by
about 50%. The code path before was 600 instructions and it's now 316. I didn't
include the call/return overhead for the five procedures in my calculations so
the 600 number is low. There is no call/return overhead in my code as it's all
in one routine now. And I've made a lot of effort to reduce memory access.
I did three public builds along the way and users were pretty happy with the
results. Though I've made quite a few improvements since my last released build.
Question: how do I go about getting the code into Mozilla? I have patch files to
implement the change with documentation as to what I did to jidctint.c.
Comment 63•21 years ago
|
||
Michael, you should open another bug and submit your patch for review to get the
code into Mozilla.
About this bug, I really dislike the way the way the patch was implemented.
1) Why are those optimizations only done for Wintel platforms, rather than Intel
platforms ? There are other platforms that run on intel chips with SSE2, ie.
Linux, OS/2, Solaris x86 . Should they not benefit too ?
2) Why is the assembly code inline in C files ? Several C compilers on different
x86 platforms cannot process inline assembly. The assembly code should be in
separate .s or .asm files , not in C files .
Comment 64•21 years ago
|
||
(In reply to comment #63)
> Michael, you should open another bug and submit your patch for review to get the
> code into Mozilla.
I'll try to figure out how to open a bug to do that.
> About this bug, I really dislike the way the way the patch was implemented.
>
> 1) Why are those optimizations only done for Wintel platforms, rather than Intel
> platforms ? There are other platforms that run on intel chips with SSE2, ie.
> Linux, OS/2, Solaris x86 . Should they not benefit too ?
Well, I don't personally have machines that run those operating systems.
> 2) Why is the assembly code inline in C files ? Several C compilers on different
> x86 platforms cannot process inline assembly. The assembly code should be in
> separate .s or .asm files , not in C files .
MSVC++ makes it really easy to do Intel extensions in that they support inline
assembly and intrinsics. Some of the nice things about inline is that you can
use your C variable in your assembler code and that it takes care of the saving
and restoring of registers in and out of your routines.
If you want, you can write a little C then a little Assembler, then a little C
and so on. If you're porting a routine, it's very nice in that you can convert
a little of the code at a time inline.
Is it the best way to do portable code? Only if you go by Bill's definition of
portable. But if I had the hardware and software, I imagine a port would be as
hard as doing a build.
I also have a port to SSE for Pentium about 40% done and have done a little work
on an MMX port.
Comment 65•21 years ago
|
||
> Well, I don't personally have machines that run those operating systems.
I don't know whether you care about propagating this code further than
Mozilla-on-Windows, but be notified: your chances of getting code accepted into
upstream libjpeg with the above approach are nil. We did not sweat bullets to
create a portable library only so that cowboys with zero interest in portability
could slap random patches on top.
Comment 66•21 years ago
|
||
(In reply to comment #65)
> > Well, I don't personally have machines that run those operating systems.
>
> I don't know whether you care about propagating this code further than
> Mozilla-on-Windows, but be notified: your chances of getting code accepted into
> upstream libjpeg with the above approach are nil. We did not sweat bullets to
> create a portable library only so that cowboys with zero interest in portability
> could slap random patches on top.
Send me some hardware and I'll be happy to port it.
Comment 67•21 years ago
|
||
I figured out how to port this to GCC, at least for Linux. For those that are
wondering why no one has ported to GCC in the past: Microsoft Visual C++ Inline
Assembly support is much easier to use than what's in GCC. Another thing is that
it's relatively easy to find examples of MSVC++ Inline Assembly and MSVC++
Inline Assembly with MMX/SSE/SSE2 instructions. GCC Inline Assembly and GCC
SIMD Inline Assembly examples are few and far between on Newsgroups and on the
WWW in general. Though there's one more working example as of this morning.
I just need to line up someone that can build and test on P4 Linux. There are a
few Linux Unofficial builders on MozillaZine and perhaps one of them will
volunteer.
My P3 SSE port is done as well and there are lots of people using it in the form
of unofficial builds.
And I plan a port to Intrinsics for Windows 64 as Microsoft's development tools
don't support Inline Assembly for that platform. I just need to get my hands on
an Athlon 64 system.
Comment 68•21 years ago
|
||
I'm looking for a volunteer with a GCC Pentium 4 or Pentium 3 build environment
on Linux to work with me on the port of the SSE and SSE2 code. This will be an
iterative project with me porting and someone building and testing. At the moment,
I have the CPUID stuff done and am looking for a sanity check on that.
I've posted in the Unofficial Builders forum but don't have any responses yet.
This is an opportunity for those that want the SSE/SSE2 code working on Linux to
step up to the plate and help to get it done.
Comment 69•21 years ago
|
||
It may have caused major regression bug 247437 (see attached screenshot), the
other offender would be bug 137478
Comment 70•21 years ago
|
||
There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) ,
starting on 2004031615 (72 records from talkback), this is currently topcrash #5
for Mozilla 1.7.
URL in the comments are:
http://www.anandtech.com
http://www.diezeit.de
http://www.linux.org/
http://strasbourg.eauxvives.free.fr/phpBB2
I can't reproduce because I don't have SSE2 but it would be worth a try ...
Comment 71•21 years ago
|
||
Those sites all open fine for me. Could you send me the assembler before and
after the crash point? Also, is hardware information available?
(In reply to comment #70)
> There are many talkback crashes in Mozilla 1.7 sse2 code (@ dct_8x8_inv_16s) ,
> starting on 2004031615 (72 records from talkback), this is currently topcrash #5
> for Mozilla 1.7.
>
> URL in the comments are:
> http://www.anandtech.com
> http://www.diezeit.de
> http://www.linux.org/
> http://strasbourg.eauxvives.free.fr/phpBB2
>
> I can't reproduce because I don't have SSE2 but it would be worth a try ...
>
Comment 72•21 years ago
|
||
Have there been any reports on FireFox?
Comment 73•21 years ago
|
||
Yes, firefox 0.9 2004061423
All crashes seems to occur on [Windows NT 4.0 build 1381], trigger reason is
"Illegal instruction", at
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/jpeg/jidctint.c&mark=716#716
It looks like the CPU supports SSE2, but NT4 doesn't like it, maybe NT4 needs a
special driver or service pack to enable SSE2 ?
Comment 74•21 years ago
|
||
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=ZZmc7.3239%24NW6.1508767%40news1.sttln1.wa.home.com&rnum=3&prev=/groups%3Fq%3Dwindows%2Bnt%2Bsse2%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3DZZmc7.3239%2524NW6.1508767%2540news1.sttln1.wa.home.com%26rnum%3D3
From: Jerry Coffin (jcoffin@taeus.com)
Subject: Re: SSE under WinNT
View this article only
Newsgroups: comp.lang.asm.x86
Date: 2001-08-09 00:56:09 PST
In article <ZZmc7.3239$NW6.1508767@news1.sttln1.wa.home.com>, ryan-
113@home.com says...
[ ... ]
> I ran into this awhile ago, and ended up deciding that NT 4 Service Pack 5
> did not support SSE instructions. I looked into it more today, and I ended
> up deciding again that NT4 sp5 (or at least my installation) doesn't support
> SSE. I'd be very happy to find out that I'm wrong, since it's a large issue
> for me. I'd also like to know why the OS has to support a particular
> instruction set.
The operating system has to save/restore the CPU registers during
context switches. There's a driver on the Intel web site to support
SSE on NT 4.
Comment 75•21 years ago
|
||
What should the solution be? Require the driver or disable this for NT?
I have a non-SSE2 1.7 build at
http://www.pryan.org/mozilla/seamonkey/mmoy/Mozilla-1.7-Release-O1-G6-noSSE2.exe
that can be used as a temporary workaround.
Comment 76•21 years ago
|
||
The Intel article on this is at:
http://support.intel.com/support/processors/pentium4/sb/CS-001642-prd483.htm
Comment 77•21 years ago
|
||
(In reply to comment #75)
> What should the solution be?
For now, filed bug 248509 with all the information
> Require the driver or disable this for NT?
For me the best solution would be to add SSE2 OS detection with MSVC build and
disable SSE2 for other Windows compiler. Be careful of the __try/__except result
on Win98 though.
Comment 78•21 years ago
|
||
I'll post further replies over there.
You need to log in
before you can comment on or make changes to this bug.
Description
•