Closed
Bug 224487
Opened 22 years ago
Closed 22 years ago
FPU control word gets clobbered by PM - can't start unless delete compreg.dat first
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: mrmazda, Assigned: jhpedemonte)
References
Details
Attachments
(2 files, 1 obsolete file)
|
6.84 KB,
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
|
6.84 KB,
patch
|
Details | Diff | Splinter Review |
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.6a) Gecko/20031101
To reproduce:
1-delete previous build
2-unzip new build
3-start mozilla
4-browse some pages
5-close mozilla
6-try to start mozilla
Actual results:
11-02-2003 15:04:39 SYS3183 PID 0041 TID 0001 Slot 006c
H:\MOZILLA\BIN\MOZILLA.EXE
c000009a
1e2b108d
EAX=02300f7f EBX=02430b80 ECX=0013f938 EDX=1e2b0d1e
ESI=023c4560 EDI=0013f938
DS=0053 DSACC=f0f3 DSLIM=ffffffff
ES=0053 ESACC=f0f3 ESLIM=ffffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:1e2b108d CSACC=f0df CSLIM=ffffffff
SS:ESP=0053:0013f900 SSACC=f0f3 SSLIM=ffffffff
EBP=0013f918 FLG=00012202
MOZJS.DLL 0001:0003108d
Expected results:
Functional Mozilla
Notes:
Works fine by simply deleting registry.dat, if it exists, before starting. I
tried with a fresh profile, but it also traps trying to run 'mozilla
-createprofile "profilename profiledir"'.
| Assignee | ||
Comment 1•22 years ago
|
||
I know what this is: we need to strengthen the code in javascript that sets the
FPU control word. Working on a fix.
Status: NEW → ASSIGNED
| Assignee | ||
Comment 2•22 years ago
|
||
To clarify, the workaround is to delete your components\compreg.dat before
startup (not registry.dat). Updating summary.
Summary: Can't start unless delete registry.dat first → Can't start unless delete compreg.dat first
| Assignee | ||
Comment 3•22 years ago
|
||
Brendan, this patch is basically an extension of the javascript patch in bug
215581. Although that patch was checked in, it was not entirely sufficient,
and now the trunk crashes on startup after the initial launch with a floating
point underflow exception.
Again, the problem is that PM calls (system function calls) are changing the
FPU control word from what was set by Mozilla at the start of a Javascript
context.
This patch (as compared to the one from bug 215581) adds a few more things:
- Sets the FPU control word in js_NewNumberValue (in addition to JS_strtod and
js_dtoa)
- Restores the FPU control word to the previous saved value at the end of these
functions. This was suggested by another developer, who stated we should play
nice, even if PM isn't.
- Does not optimize jsnum.c. Due to an optimizer bug in GCC, the _control87()
call was getting reordered after the floating point calls, resulting in a
crash.
I am still investigating whether this whole mess is a compiler bug (this did
not happen with Visual Age C++, only with GCC for OS/2). In the meantime,
though, the nightlies are in bad shape, and I'd like to get this fixed.
| Assignee | ||
Updated•22 years ago
|
Attachment #135213 -
Flags: review?(brendan)
Comment 4•22 years ago
|
||
Comment on attachment 135213 [details] [diff] [review]
js patch
I still think it's evil that the fpu mode can't be set persistently. How do
numerically intensive apps cope with any efficiency?
Use ok, not rv, as the name of JSBool result variables. Also, set ok from
return values to propagate errors, avoid if (!js_NewDoubleValue(...)) ok =
JS_FALSE. E.g. instead of this:
js_NewNumberValue(JSContext *cx, jsdouble d, jsval *rval)
{
jsint i;
+ JSBool rv = JS_TRUE;
+
+ SET_FPU();
if (JSDOUBLE_IS_INT(d, i) && INT_FITS_IN_JSVAL(i)) {
*rval = INT_TO_JSVAL(i);
} else {
if (!js_NewDoubleValue(cx, d, rval))
- return JS_FALSE;
+ rv = JS_FALSE;
}
- return JS_TRUE;
+
+ RESTORE_FPU();
+ return rv;
}
do this:
js_NewNumberValue(JSContext *cx, jsdouble d, jsval *rval)
{
jsint i;
+ JSBool ok;
+
+ SET_FPU();
if (JSDOUBLE_IS_INT(d, i) && INT_FITS_IN_JSVAL(i)) {
*rval = INT_TO_JSVAL(i);
+ ok = JS_TRUE;
} else {
ok = js_NewDoubleValue(cx, d, rval);
- return JS_FALSE;
}
- return JS_TRUE;
+
+ RESTORE_FPU();
+ return ok;
}
/be
brendan: here's what i would have hoped, except i believe that i'm wrong.
any fpu intensive app should put its fpu task onto a thread which doesn't call
ui functions, and hope that the os is responsible for preserving fpu state for
thread context switches.
the problem with this is of course that we (dbradley and I) think the os doesn't
do that. if that's true, then no matter what you're likely screwed in the event
*some* thread changes the state and doesn't manage to restore it before it loses
cpu control :).
Component: XPCOM Registry → JavaScript Engine
QA Contact: PhilSchwartau
Summary: Can't start unless delete compreg.dat first → FPU control word gets clobbered by PM - can't start unless delete compreg.dat first
Comment 6•22 years ago
|
||
This is one of the things we were hoping to avoid with the previous patch, but
apparently it is an issue.
My fear is that this only reduces the chance. What happens when a thread context
switch occurs within the set/restore area? I'm thinking for this to be solid
you'd need to prevent any context switches between the two. I know this is
possible under Windows, not sure about other operating systems.
The other issue is performance.
There was another solution and that was to make the function work in the
presence of higher precision. Testing such changes is difficult and that's why
we avoided that solution initially. But given the negatives of this solution,
maybe that should be revisited.
Component: JavaScript Engine → XPCOM Registry
Comment 7•22 years ago
|
||
Any OS that doesn't properly virtualize machine state among threads, so that FPU
mode is saved on context switch and restored on switch back to run that thread,
is just broken. Don't use broken OSes.
Fixing Component.
/be
Component: XPCOM Registry → JavaScript Engine
Comment 8•22 years ago
|
||
A thought on avoiding de-optimizing the compilation of jsnum.c on OS2. Maybe
you could use volatile appropriately (declare _control87 as volatile, or something).
/be
Comment 9•22 years ago
|
||
Don't mind me, I remembered it wrong. It wasn't another process that was setting
the control register and leaking it across, it was a COM side effect. I should
have went back and reviewed the bug instead of relying on my memory.
The question remains how big a perf impact is this? I guess given it's OS/2 only
it's may not be a big deal. From what I saw the _control87 call isn't trivial.
This scenario could possibly occur in other OS's.
| Reporter | ||
Comment 10•22 years ago
|
||
*** Bug 225346 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 11•22 years ago
|
||
Updated patch based on comments. Also, are the masks for _control87() in
jsnum.h correct?
Attachment #135213 -
Attachment is obsolete: true
| Assignee | ||
Updated•22 years ago
|
Attachment #135213 -
Flags: review?(brendan)
| Assignee | ||
Updated•22 years ago
|
Attachment #135339 -
Flags: review?(brendan)
Comment 12•22 years ago
|
||
Comment on attachment 135339 [details] [diff] [review]
js patch v1.1
Use ok, not rc (or rv), for such JSBool variables (see the jsdtoa.c changes).
I have no idea what args to _control87 are correct for OS2, you tell me ;-).
Did you try volatile to avoid deoptimizing jsnum.o? Where is _control87
declared currently, and can you cause that declaration to be volatile, or
otherwise force the issue?
Fix the s/rc/ok/g nit and r=brendan@mozilla.org.
/be
Attachment #135339 -
Flags: review?(brendan) → review+
| Assignee | ||
Comment 13•22 years ago
|
||
I tried everything I could to work around the optimizer bug, but got nowhere.
I'll keep trying, and see if the optimizer can get fixed.
| Assignee | ||
Comment 14•22 years ago
|
||
This was the patch that was checked in, with changes based on Brendan's
comments.
| Assignee | ||
Comment 15•22 years ago
|
||
I'd like to keep this bug open while I investigate whether this is actually a
compiler bug, as I suspect. If it is and I can get the compiler fixed, then
we'll just back this crap out. Otherwise, I'll close this bug out.
Comment 16•22 years ago
|
||
I hadn't gotten a nightly build in a while so I got this one:
Mozilla/5.0 (OS/2; U; Warp 4; en-US; rv:1.6b) Gecko/20031205
I used this build for a short time & the problem does not appear in this build.
I have carefully NOT said the bug is cured. I can't speak to that because my
reading of this discussion is that it is pretty subtle & deep & I am not
qualified to speak to any of the internal issues. All I can say is that I am no
longer observing what I was observing. Unfortunately, due to Bug 225472, I have
dropped back to the 1.6 Alpha.
| Assignee | ||
Comment 17•22 years ago
|
||
Ok, it looks like this needs to be fixed in Mozilla and cannot be fixed in the
compiler or system. Closing as fixed.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 18•22 years ago
|
||
Javier, can you say more about what you learned? Why is this necessary on OS/2
but not on similar OS/compiler platforms? Thanks.
/be
| Assignee | ||
Comment 19•22 years ago
|
||
Well, first I should mention that we (OS/2 Mozilla developers) are not the only
ones that are seeing this, and our compiler (GCC) isn't the only one affected.
The current speculation seems to point to the loading of some Presentation
Manager DLLs is changing the FPU control word. At the moment, no one has found
a possible workaround for this issue, other than what we are doing here.
Comment 20•22 years ago
|
||
This bug is PC, OS/2. If there are other platforms affected, don't we need
other ifdefs? I know, gcc may not be relevant, but I was really asking whether
the problem was in OS/2, and how the conversation went about fixing the PM....
/be
| Assignee | ||
Comment 21•22 years ago
|
||
Sorry about that. Yes, this is OS/2 only. The conversation is more or less
still on going, although no one is optimistic that it can be resolved by fixing
the system (Presentation Manager). It looks like on OS/2, developers will have
to be very careful when dealing with floating points, more so than on other
platforms.
Updated•20 years ago
|
Flags: testcase-
You need to log in
before you can comment on or make changes to this bug.
Description
•