Closed Bug 160602 Opened 22 years ago Closed 22 years ago

Large integers, e.g. getTime(), causing crash at 0x39393929

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: mike.campbell, Assigned: khanson)

References

()

Details

(Keywords: crash, Whiteboard: [Windows-only] [Related: bug 140852? ] fixed1.3)

Attachments

(5 files, 2 obsolete files)

STEPS TO REPRODUCE 1. Load http://www.oracle.com 2. Crash! Note that the crash does not occur everytime. Sometimes the page loads, othertimes the browser (and mail) crashes with a windows exception violation error, and other times it crashes with no windows error.
Note that this has failed with the Mozilla 1.1 beta as well as the nightly build labeled 2002073004
wfm with win2k build 20020730.. Reporter: Which Build and which flash version do you use ? Can you please use a talkback enabled build and send a talkback report ? After Talkback sent this report run mozilla/components/talkback.exe and add the talkback ID from the crash report in this bug. Thanks
Severity: normal → critical
Keywords: crash, stackwanted
Mike has sent me three Talkback IDs: TB8669796Y TB8779296W TB86957678W The first two contained no stack traces! Looks like whatever caused the crash also prevented the Talkback code from working. The third one I could not locate in the Talkback database, and it looks like there is a typo in the incident number, as it is one digit longer than the others. I, too, have been unable to duplicate this crash -
Note that this problem ONLY occurs if javascript is enabled in the browser. Disabling javascript prevents the crash but the page does not load everything. Shockwave Flash 6.0 r40
Note that the 3rd talkback seesion is TB8695678W
Reporter: Can you please close Mozilla and remove (temporary) "flashplayer.xpt" from your plugins directory and try it again. if you still crash, remove "npswf32.dll".
There was no flashplayer.xpt (nor any flashplayer.* files anywhere) that I could remove. I did remove the "npswf32.dll" file but the crash still occured. The current files in the plugins directory are: NPSWF32.dll nprdx5.dll nprpverplug.dll npnul32.dll nprjplug.dll pnmi3260.dll Note that the "npswf32.dll" was restored after it did not resolve the problem.
Note: I looked up that 3rd talkback session (TB8695678W). Unfortunately, once again there was no stack trace recorded! Here are some facts from the report, in case it helps: Processor: Vendor Type Speed Features GenuineIntel Pentium 598 MHz MMX Operating System: Windows NT 5.0 build 2195 Service Pack: Service Pack 2 Physical Memory: 384.0 MB Memory Status: Available Total Physical Memory: 169.1 MB 384.0 MB Page File: 687.9 MB 921.9 MB Virtual Memory: 2028.9 MB 2047.9 MB Mounted Drive Information: Type Size Free File System A: Removable - - - C: Fixed 19571.3 MB 8232.0 MB NTFS D: CD-ROM - - CDFS Network Card: 3Com 3C918 Integrated Fast Ethernet Controller (3C905B-TX Compatible) Screen Configuration: 1024 x 768, 24 bits per pixel. 75 Hz. 8388608 Bytes
i still can't reproduce this with a recent build and 1.1b Do you see this crash if you open the main page or if you click links on the page ? Have you installed mozilla in a clean directory or over an older build ? I thought that this is a flash problem since talkback can't sometimes catch crashes in a plugin (Java, flash..)
A new note. I can also duplicate the crash if I visit the url http://vh1.com. I get the same browser crash. A new talkback session has been uploaded with ID TB8993332Z with the capture of the crash from visiting vh1.com. Note that the current version of mozilla that I am using is 2002072104.
Mike: thanks again! Unfortunately, this latest incident (TB8993332Z) also comes up with your machine data but no stack trace!!! Meanwhile, I'm looking at http://www.oracle.com and http://www.vh1.com to see if there is any common principle involved between the two sites -
cc'ing Amar: are you able to crash on either http://www.oracle.com or http://www.vh1.com with your Win2K box? All you have to do is load each site, no other actions are required. We were wondering if its a plug-ins problem, but we aren't sure. Thanks -
Both the sites http://www.oracle.com and http://www.vh1.com does not crash for me on WIN2K with both 2002-08-01-08-1.0(branch) and 2002-08-01-08-trunk builds.
Another site that seems to demonstrate the same behavior on my machine is www.cnet.com. It crashes the browser just about every time I visit the page.
Mike: thanks. I can't remember if we've tried a new Mozilla profile yet. You can always bring up the Profile Manager (if it doesn't already come up automatically) by launching Mozilla from a console: [(path to Mozilla)] ./mozilla -profilemanager I wonder if the crash goes away by running under a brand-new profile. Sometimes old profiles get corrupted. It's worth a try - Also: do you have access to any other Win2K machine? Does Mozilla crash on that machine on these sites, too? Finally: you mentioned that the problem seemed to go away for awhile. Is it possible that there is some background application that is running when the crashes occur? Like an audio program of some sort? And when that app is not running in the backgound, no problem?
I do not have another windows machine to test this on but did try it from a linux box. The browser does NOT crash from linux. I tried a new profile from the windows box but it still crashed so I don't think it is related to the profile. I don't know of anything that is or is not running when the crash occurs that is different from other times. In general I run the same stuff all day. From what I have noticed it seems the crash will occur immediately after starting up the browser. However, if I do some other work and then visit the problem sites (maybe 30-60 minutes later) the crash does not occur. Go figure.
cc'ing dbradley for advice. Note Mike's comment above: > From what I have noticed it seems the crash will occur immediately after > starting up the browser. However, if I do some other work and then visit the > problem sites (maybe 30-60 minutes later) the crash does not occur. Go figure. Have you ever heard of this type of behavior? Is it possible that some XPCOM components get lazily registered, and this process is not working correctly? Note the crashes Mike has experienced are very hard: no stack trace has been preserved in Talkback for any of them.
Some additional configuration info I'll add. I didn't mention it before as it should not affect anything but just to make sure: 1 - My internet connection goes through a Redcreek Ravlin II VPN hardware box and a linksys router. 2 - I also run the program Proxomitron on this win2k machine but have disabled its use and the browser still crashes.
This is an odd one. I bounced between VH1 and Oracle and had no problems, even ran it under Purify. The only thing I saw in common was that both sites had references to JS files. I didn't see anything else that appeared to be common between the two sites. The only question I can think to ask, is when it crashes, does the pop-up give you a DLL name? Maybe try clearing your cache. Lastly a grasp at a straw, try deleting the xpti.dat file in the components directory.
Attached image Error dialog —
No luck with the other suggestions. I cleared the cache (both memory and disk) and deleted the xpti.dat file. After restarting mozilla the xpti.dat file was recreated but the crash still occured when visiting oracle.com. Note that in the uploaded error the instruction is always the same when the browser crashes. The 0x77f83cab may not mean anything but it is the instruction always listed. Note also that I don't have to click ok to terminate the program. It has already died. I just press ok to get rid of the dialog.
The only interesting thing I see in the error message is that the address 0x39393929 represents a string "999)". No Mozilla string contains this, though and I don't see it in the HTML of any of the sites in question. Do you know if that number changes (The second one in the error dialog).
Looks like the address of memory is always 0x39393929. I tried it several times and it always showed the same address no matter what site I visit.
*** Bug 161574 has been marked as a duplicate of this bug. ***
From the duped bug, Slashdot.com is having the same problem.
Confirming bug: thanks to Mike's patience, and to dbradley, we've finally found a pattern! From the duplicate bug report: > When trying to get to slashdot.org (and some other sites but slashdot is > best example) Mozilla immediately closes (sometimes with "instruction at > 0x77f83cab referenced memory at 0x39393929 <- this value is suspect) > - Talkback doesn't get activated. It's not always - it's pretty random but > frequent (like 30%). > It started some time after 1.1 branch. Note the other bug report was also made from a Windows 2000 box -
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hi, I reported slasdot.org as having the same problems. Few additional notes: - I have the same problem on 2 win2k boxes (on different connections, one is Duron/Soltek and the other is genuine Dell/Pentium). - I downloaded the site (with wget -p) but couln't reproduce the error locally (I tried several times) - It's alway 0x39393929 - lots of such strings 999) are in history file (and other) - I cleaned the history (and all other files that could be regenerated) but it didn't help.
Have you installed Mozilla in a clean directory or over an older build ? Do you get a stack trace (from drwatson) if you start c:\winnt\drwatson.exe ? I'll provide an optimized builds with symbols if that works.
see http://www.computerhope.com/software/drwatson.htm how dr.watson works and how do you find the log.
I removed old build before installing new, now I have 2002080614. I know how drwtsn works but I guess it doesn't catch these crashes. Anyway I tried to reproduce the problem last 45 minutes without luck :) - I'll try again tomorrow (it's 2:30 am here).
Ok, my mozilla crashed on http://www.mbank.com.pl but it doesn't generate the crashump or log. I know there're two types of crash msgboxes on windows, one is with yellow "warning" sign saying "...saving log" (and that's dr.watson's) and the other (with red "stop sign) - and that's the one we can see when mozilla crashes.
Can the people who are able to "reproduce" this crash report: 1. type of video card. 2. Windows version. Win2k sp? 3. Is QuickLaunch turned on? 4. Anything else you can think of ;-) Just trying to figure out what's common.
My video card shows as an ATI 3D Rage Pro AGP 2x Windows 2000 sp2 and now sp3 (crashes occur in both) quicklaunch is not enabled
1. type of video card. GeForce 2MX and other one the Dell system 2. Windows version. Win2k sp? win2000 sp2 + some pre-sp3 hotfixes win2000 sp3 3. Is QuickLaunch turned on? both on and off
Other note: I use the same profile (copied) on both boxes BUT it checked with new clean profile and new installation of mozilla and it still crashes. Is there any other place that mozilla uses to keep data (except of profile directory)? registry? %WINDIR%? %USERPROFILE%?
cc'ing jrgm in case he knows the answer to that -
Mozilla stores data in : c:\programs\mozilla.org C:\Dokuments and Settings\%username%\Application Data\Mozilla\ and only some minor (!) things in the registry. (windows integration =open Html files with Mozilla) but Mozilla always searches for installed plugins in your system there is no affitional data stored from Mozilla.
Matti's answer is correct (with the slight addendum that the location of the profile is a bit different on win95/98, but we're talking win2k here so %USERPROFILE%\Application\Mozilla is the target). The stuff in the win32 registry is relatively minor. It wouldn't be my first concern in looking at this crash.
As Mike suggested: > Note that this problem ONLY occurs if javascript is enabled in the browser. > Disabling javascript prevents the crash but the page does not load everything. Today I turned off javascript and browsed many pages - and Mozilla didn't crashed - so it seems the bug IS javascript related. I'll continue this test tomorrow...
After another day of javascript-off browsing It seems to be true - something's wrong with js. It worked well whole day until I turned js on (I wanted to submit one form) - it crashed immediately.
Browsing without JS turned on doesn't prove anything. The browser has a huge number of modules. Imagine this type of picture: Module X ---> JS Module ---> Module Y ---> Module Z ---> Module W ---> ... ^ ^ | | | | user shuts off JS but the bug is actually here Sure, shutting off JS will stop the bug, because the bug is farther down the chain of interdependencies. That doesn't mean there is a bug in JS !!! There conceviably could be; but not much can be learned by turning it off; it's too low-level in the browser to tell.
> - I downloaded the site (with wget -p) but couln't reproduce the error locally > (I tried several times) this seems like a clue, but it's hard to know what it means. but you might download just the html and add a <base href="___"> to the <head> ----------- <html> <head> <base href="http://www.oracle.com/"> ... ----------- that would get Mozilla to load everything but the html from the network (which might be part of the problem). if so, you could then prune the html page down to a simpler page that crashes (which would be very helpful!) there could also be something happening with the cache. loading locally won't hit the cache. Also, since disabling Javascript prevents the crash, you might try stepping through the javascript in the Javascript debugger to see what the Javascript is doing when it dies.
Ok, I have something more specific: It crashes ONLY when file is open via http connection (not file://) I created following file <html> <head> <base href="http://www.infopoll.com"> <script language="javascript" src="/live/infopoll.js"></script> </head> <body><h1>hello</h1></body> </html> and put it on local Apache server - now every access make mozilla crash. Other thing - it crashes more frequently when started with empty cache (it looks it's liek that).
Sorry - I was wrong It crashes mozilla w/o server connection (just file open) - seems there's something strange in this .js file I'll try to find this evel line :)
Ok, finally got it - following piece of code crashes mozilla: <script language="javascript"> var expdate = new Date(); var base = new Date(0); expdate.setTime (expdate.getTime() + (24 * 60 * 60 * 1000)); </script> hope it will help you fix the bug :)
this looks like a dupe of bug 140544. are you using Macro Express or JS Virtual Pager?
Yes, it seem to be a dupe BUT I don't run any of those 2 tools. As I said before I experienced this on two boxes so I'll list applications that I have on both: - Trillian - TClockEx (it may be suspect, I'll check after I post this) - Tiny Personal Firewall - Edit+ - IrfanView - Windows Commander - MS Office XP - other common applications (I don't think it's the case) like winamp or getright Anyway in 140544 there's short piece of JS code doing something with Date/time (like my example do). Does it mean anything?
Marcin: thank you for finding a reduced tetscase for this bug! The JS code from bug 140544 is indeed very similar: <SCRIPT language="javascript"> var now = new Date(); var tail = now.getTime(); document.write(tail); </SCRIPT> whereas your testcase is <SCRIPT language="javascript"> var expdate = new Date(); var base = new Date(0); expdate.setTime (expdate.getTime() + (24 * 60 * 60 * 1000)); </SCRIPT> Mike: 1. Do you crash on either of these examples? 2. What applications besides Mozilla are running when you crash? 3. Any chance you are using Macro Express or JS Virtual Pager (cf. bug 140544)?
Note the reduced testcases from bug 140544 and this bug both involve the .setTime() method of Date objects. This returns a large integer (the number of milliseconds since 1970-01-01 GMT). For example, here is what I get right now: (new Date()).getTime() ---> 1029166874593 We have an open bug in JS Engine on numbers like this. The bug shows up on the Windows OS only, and in optimized builds only; very similar to what has been reported here and in bug 140544: bug 140852 "String(819187200000) == '8191871:0000' in xpcshell, browser" I'm wondering if the issue there, though no crash is mentioned, might be related to the crashes that are occurring here. Note the only stack trace we've been able to get so far is from bug 140544: ------- Additional Comment_ #15 From Hal Black 2002-05-09 14:50 ------- Yes, here is the stacktrace: NTDLL! 77f83b27() NTDLL! 77f83bae() NTDLL! 77f82f0b() XPCOM! 10033c78() 03670488() XPCOM! 1000b862() f18b5608()
Whiteboard: [Duplicate of bug 140544? ]
Typo above: I meant to say |getTime|, not |setTime|.
cc'ing rogerl, khanson to ask: could the problem in bug 140852, "String(819187200000) == '8191871:0000' in xpcshell, browser", cause a crash? I suppose the thing to do now is run the above testcases under Purify on Win2K -
I ran the example from comment 45 under Purify and didn't have any problems. Unfortunately my build was compiled with /O2 and not /O1. I'll try again after getting it build using /O1. I would have expected that not to make that much of a difference, since I think /O1 is a subset of /O2 IIRC.
Both of the sample JS code snipets will cause my browser to crash with the same invalid memory referenced errors as the web pages that I've reported. Looks like you may be on to something here. Note that I do not have Macro Express or JS Virtual Pager running but I do have another pager called eDesk running. However, I have tested with eDesk shutdown and the crash still occurs.
I can confirm bug 140852 on my box, I played with stuff like that: var expdate = new Date(100000000001); alert(expdate.getTime()); and got: 100000000000.: but it does NOT crash Mozilla - I played with values a while, for huge numbers I got NaN (expected result), sometimes the result contains garbage (like above example) - but I got no crash. So, maybe that's Date() constructor is guilty?
I found something more: var expdate = new Date(); alert(expdate.getFullYear()); gives 34583 hm... I took the red pill...
I forgot to mention that var expdate = new Date(1029185125000); // the number is my box's timestamp*1000 alert(expdate.getFullYear()); gives correct answer (2002), again - something's wrong with Date()
Marcin: does this javascript:URL give you 34583 ??? javascript: alert((new Date()).getFullYear()); (Note: if anyone on the bug is unfamiliar with javascript:URLs, they are entered in the URL bar just like an http:URL)
Now it stopped working this way and shows 2002 (as it was mentioned before also crashes does not happen every time). But javascript: alert((new Date()).getTime()); crashed Mozila I'll try again get this year wrong...
Mike and Marcin: thanks for testing this again! Will Mozilla still crash for you if you remove the alert() and just do this? javascript: (new Date()).getTime();
yes, it closes immediately.
Reassigning to JS Engine and cc'ing Brendan on this crasher. I have never been able to reproduce this, and the contributors' Talkback reports are all devoid of stack traces. The same has been reported in bug 140544. I will continue to try to reproduce this and see if I can get a debug stack trace - Note we have an open bug in JS Engine for large integers, which is what (new Date()).getTime() will produce. The bug shows up on the Windows OS only, and in optimized builds only; very similar to what has been reported here and in bug 140544: bug 140852 "String(819187200000) == '8191871:0000' in xpcshell, browser"
Assignee: Matti → khanson
Component: Browser-General → JavaScript Engine
QA Contact: asa → pschwartau
*** Bug 140544 has been marked as a duplicate of this bug. ***
cc'ing contributors from the duplicate bug 140544. Note they report their crashes depends on other software running in the background: the JS Pager virtual desktop (http://hem.fyristorg.com/jspage/), or Macro Express 3. Contributors here find these apps are not necessary to crash (or are these running as hidden processes, perhaps?) On Windows this can be investigated by doing Task Manager > Processes > (alphabetize the process names) Meanwhile, Hal has come up with a binary stack trace, and is currently trying to get a debug stack trace: ------- Additional Comment_ #29 From Hal Black 2002-08-12 18:20 ------- "Yes, same crash... Note, this is only with JS pager on that I get this crash. Otherwise, no problems. Here's the stacktrace I get when running with build 2002071608 and using that javascript URL: javascript: (new Date()).getTime(); NTDLL! 77f8e59a() NTDLL! 77f8edc6() NTDLL! 77f848a5() XPCOM! 60eb1928() 02ba30c8() XPCOM! 60e89a75() f18b5608() I download/compile/debug with newer versions if there's something you'd like to try. This is fairly reliably reproducable..."
Okay, verified (javascript URL crash) with latest build (2002081209), here's the stacktrace. NTDLL! 77f8e59a() NTDLL! 77f8edc6() NTDLL! 77f848a5() XPCOM! 60eb18cf() 027999f8() XPCOM! 60e899e7() f18b5608() I haven't been able to get it to crash (ever) while running a debug build. Note, that JS Pager must be running when Mozilla STARTS to cause the crash. Stopping it while Mozilla is running still causes the crash, and starting it after mozilla doesn't cause the crash.
FYI, I've been able to reproduce this bug in a optimized with symbols build. I'm trying to make sense of what I'm seeing right now. Wouldn't do much good to post what I've found at this time, it's way to strange. Looks like the Windows message queue is getting corrupted, and I'm trying to figure out how and by who.
Ok, here's where I'm at so far. The FPU's CTRL register is getting clobbered. I traced this down to CoInitialize, but I've seen it happen in other calls as well, after commenting out CoInitialize. Could be those calls made calls to CoInitialize, though. What happens is the precision the FPU uses changes. Normally we run at 53 bits of precision but in this case it is getting bumped to 64. This then causes the rounding issue we see in this code. To verify this I manually assigned 0x027f to the register (the value when JS Page is not running) within js_dtoa and the function did the conversion properly. So I think we have two problems here. 1. Why is the FPU's control register is changing. 2. Why are we overflowing the buffer when we were given a size. I don't know enough about the FPU and the instructions to know if this is a compiler issue, an OS issue, or a JS Page not playing nice issue. I'm sure we could slam in some assembler around CoInitialize to save off this value and restore it, but I'd like to understand the problem a little better.
I tried an experiment. Wrapping the CoInitialize call isn't going to be enough. Apparently there are many thing that Mozilla calls that also call CoInitialize that we have no control over. I also verified that this behavior occurs in other applications as well. I suspect this may be COM forcing higher precision for some of its types. What I still don't understand is how JS Pager is triggering this, unless it's registering something that needs this precision. So as I see it we have two options. 1. Get the floating point conversion code working with the higher precision numbers 2. Wrap the code for Window's (or maybe Intel CPU's?) with assembler that saves off the precision, sets it to what we need, and then restores it.
I would think that the floating point control register would be part of a "context switch"? As in, if that is a value set by an application, when the OS switches from one process to another, it should save it off and restore it when it comes back. Say you wrap the call, for instance... Well, if the context switches inside the wrapper, wouldn't that mess things up also? Maybe there is some initial value not being set by mozilla when it starts up? Does it assume the FPU control register is a certain value, but not set it to that on startup?
I wondered that myself, whether this FPU register wasn't getting preserved properly. So I ran a little test, and it does. It's not getting changed because it wasn't restored after the context switch. It appears that the code within CoInitialize is setting it. Somehow JS Pager running triggers this effect within CoInitialize. I wish the debugger would allow you to set breakpoints on that register, unfortunately it doesn't. I tried walking through the assembler, but it's pretty far in. I also verified that it has this affect on other application that call CoInitialize and it does. I created a simple MFC app that called it, ran JS Pager and then the MFC app and it got set. I suspect most applications don't notice this, because floating point isn't used all that much, and when it does, usually small rounding errors don't matter that much.
I would expect, but don't know for sure that this register would be preserved on a thread basis as well as a process basis.
Now that I think of it, this may be an optimization issue. I ran a debug build and the FPU CTRL register has the same value. But the results it produces are "correct" and it runs fine. So it may be VC++ not correctly clearing something or taking some kind of short cut it shouldn't. I'll see if I can compare the assembler and see what might be the problem.
I don't know if this will help or not, but if you disable the JS Pager option of "Send to Desktop (x,y)" in the system menu the crash does not occur. The crash when running JS Pager *seems* related to the extra system menu items.
Joe also discovered this (from bug 140544): "Note: Mozilla1.0 did not have this problem. It started this behavior with 1.1alpha."
Attached patch A possible solution (obsolete) — — Splinter Review
I'm running on about 10 minutes of sleep in the past 30 hours, so take this patch with that in mind ;-) From what I could tell the line L = (Long) ((d / ds); had some rounding issues when the precision was increased. I added a 0.5 to d and I think that clears up the problem. Also there may be other similar issues in this function, but from what I can tell the rest looks ok. This will add to the time to this function, don't know if there's a more efficient way to achieve the the same thing. In any case, I see no more crashes wh
David: thanks!!! cc'ing Daniel, Steve -
*** Bug 153402 has been marked as a duplicate of this bug. ***
David, Given the apparent change in precision in the FPU, you might give a glance at http://www.netlib.org/fp/gdtoa.tgz, which is Gay's dtoa code stretched to other precisions. I glanced there myself, but was not struck by anything obvious. None of the code there had your solution added. Since this code exists, however, I'd say that we should *not* rely on the dtoa code in Mozilla to work with anything other than 53 bit mantissas. You might also give a shout on the netlib mailing list (if that's even possible); see the contact info in bug 156253. --scole
I need to learn to read the comments ;-) The comments say to use _control87(PC_53, MCW_PC); Not sure how portable that is. I effected the same thing with assembler and it did fix the problem. Assuming this works on the Intel platforms we build on, do we sprinkle these around code dealing with floating point numbers? Could this trip up plugins that might rely on the 64 bit mode? Do we want to incur the overhead of trying to wrap the code and preserving the mode? Some things to think about. Kenton what's your take?
cc'ing Waldemar -
Whiteboard: [Duplicate of bug 140544? ] → [Related: bug 140852? ]
David, Setting the FPU environemnt to the default VCC++ is a perfectly valid defense against corruption by other users. All of these routines in (dtoa) assume this default environment.
I agree, that's probably the easiest solution given the current situation. I wasn't sure how expensive the operation. Should it be set to 53 bits and left that way. Or should we save off the previous state and restore it on exit? The _control87 seems to have support by VC++ and I see references to gcc and this function. I don't know if we can blanket all intel based environments or not. I'm thinking of OS/2 and Beos. Looks like OS/2 supports it. This function allows us to set the flags and get the previous value so we could restore it on exit.
Attached patch Patch to address the FPU precision (obsolete) — — Splinter Review
Another possible patch. This patch takes the approach of switching the FPU to a 53 bit mantesa aka IEEE compliant. 1. Should we worry about restoring it to the previous state? 2. Are there other areas using floating point math in the JS engine that this should be used? 3. I don't know if that we can assume that once it is set, that it will remain set, thus I set it on each call. We have no control over plugin code or other API calls might do. 4. These calls to _controlfp don't appear to be cheap, not sure how expensive they are in a release build. I could do the inline assembler, it's easy enough to do, but wasn't sure how supported _asm was on the various Intel based compilers. 5. I used _controlfp rather than _control87 because at least according to the MS documentation _controlfp is more widely supported in non MS compilers.
Attachment #95120 - Attachment is obsolete: true
David, do you know if the "cheapness" varies between get and set or not? I'm thinking that if getting is cheaper than setting, then we probably want to do a get first, to see if we need to change the mode at all (since this probably will happen only rarely). We probably also would only perform a restore if the control word changed. --scole
David, The Javascript standard avoids issues of non standard arithmetic. It is excepted to run in the default IEEE setting. It expects floating-point precision to be double (53-bits of of mantissa), rounding to nearest, and no floating-point trap handlers enabled. This default environment is set in “js_InitRuntimeNumberState” in jsnum.c. Actually it doesn’t get set for all environments (bug 109286), but it does for Intel. This implies that the environment is getting changed after Javascript has started. The crash you are seeing could be caused by any of these non default settings. It is somewhat troublesome that it is happening. After you restore the settings to those originally encountered, the remainder of the Javascript program is vulnerable to a strange floating-point environment. What I’d like to see is an alert box that occasionally and randomly checks for non-standard fp environments and alerts us of this situation. At the minimum it should be included in debug builds. It should report all non standard settings.
js_InitRuntimeNumberState only sets the interrupt mask and doesn't touch the precision of the FPU. So maybe the macro FIX_FPU needs to be modified to change the precision as well as the interrupt mask. There's still a risk this could change after this call, although CoInitialize is called before this function. Changing the FIX_FPU macro might at least address 99% of the cases.
Here's a short patch that changes FIX_FPU macro to set the precision as well as the exception mask. I did a quick test that started JS Pager after CoInitialize was called, and the FPU didn't appear to get set back. So looks like this may take care of most of the cases for us.
Attachment #95579 - Attachment is obsolete: true
Comment on attachment 95725 [details] [diff] [review] Patch that changes FIX_FPU to set precision r=khnason I think this is the correct fix. I didn't realize precison wasn't getting set correctly.
Attachment #95725 - Flags: review+
Comment on attachment 95725 [details] [diff] [review] Patch that changes FIX_FPU to set precision "mantissa", and period at the end of sentence in comment, please. Fix those nits and sr=brendan@mozilla.org. /be
Attachment #95725 - Flags: superreview+
Attached patch Patch that was checked in — — Splinter Review
fix checked in. Do we want to try and get this on the 1.1 branch?
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I'm not familiar with the bug fix scenerio here. If the patch was checked in does that mean it will be in the nightly builds? If so should I just be able to download the latest build and try it out?
you can try the tomorrows nightly trunk build (ftp://ftp.mozilla.org/pub/mozilla/nightly/latest-trunk/)
Cc'ing some drivers to consider this for 1.0.1 and the 1.0 branch. I think we should take it, but I sr'd, so I don't count as a driver here. /be
Keywords: mozilla1.0.1
I downloaded Mozilla from trunk builds (binaries dated: 2002-08-20 8:53) and didn't crashed yet (I did all the tests + some browsing). Results from .getTime() and others are valid. I just wonder if that version contains fix from David, he announced the fix at 06:06 (how long the build lasts?)
Windows usually builds pretty quickly, so I would think it had the change, but hard to know 100%.
Bad news. My mozilla crashed on .getTime() This is trunk build, binaries are dated 2002-08-21 9:27 Do you know the assembly that is produced by _control87(MCW_EM | PC_53, MCW_EM | MCW_PC) ? (so I could check if this version contains the fix)
Here it is. The key difference is the two number in the push statements. bytes: 68 1F 00 0B 00 68 1F 00 09 00 8B 77 14 FF 15 A0 00 04 01 0102374F push 0B001Fh 01023754 push 9001Fh 01023759 mov esi,dword ptr [edi+14h] 0102375C call dword ptr [__imp___control87 (010400a0)]
Yeah, I got it in js3250.dll: .600C374F: 681F000B00 push 0000B001F ;" &#9794; &#9660;" .600C3754: 681F000900 push 00009001F ;" &#9675; &#9660;" .600C3759: 8B7714 mov esi,[edi][00014] .600C375C: FF15A0000E60 call _control87 ;MSVCRT.dll
When it crashes, are you getting the 0x39393939 numbers, or is it something different? Trying to determine if this is the same crash or something different. Which test case are you using? What other software is running on your system at the time? I was testing with JS Pager.
It crashed with standard 0x39393929 msg box, as test I used "javascript: alert(new Date().getTime())" url, later as a test I removed almos all processes and stopped almost all services - and it still crashed. Later I tried harded removing processes and after I removed CTFMON.EXE it stopped crashing (I'll test it again). CTFMON is part of MS Office XP (it handles alternate keyboards etc.)
CTFMON may be causing some Windows API function we call to set the FPU back to 64bit, just a guess. This must be occuring after the call the call to js_InitRuntimeNumberState. As Kenton stated, the JS numerical system is designed to work with IEEE doubles and the FPU getting into this 64 bit mode is going to cause problems not only in this area but others. I wonder if this is this really a Mozilla problem, or maybe some bug in COM?
I tested w/o CTFMON and it crashed too. I'll do some more tests later...
Reopening since the problem still persists in some environments.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I am one of the original reporters. In the two or three builds since this bug was said to be resolved, Mozilla no longer crashes on washingtonpost.com or cnet.com, and a couple other pages, but there are definitely more crashes for me in general since the "fix". Coincidence that some new bug might be present in Mozilla now or is this related to the code change?
Yes, in recent builds there are some new bugs that are causing frequent crashes, which are unrelated to this issue. Fortunately this bugs unique 0x3939393? pointer value makes it pretty easy to identify from other crashes.
For those still affected by this problem, bug 140852 is where the bulk of the discussion/investigation is occuring. That bug deals with the colon. The failure cases that remain for both bugs all deal with floating point errors and the js_dtoa function having problems when the error occurs on the low side rather than the high side.
Depends on: 140852
*** Bug 165580 has been marked as a duplicate of this bug. ***
Still pursuing this bug. I am going to put some test points in the code to detect unusual settings of the rounding direction, precision control and FPU exception handling settings.
*** Bug 167403 has been marked as a duplicate of this bug. ***
*** Bug 165220 has been marked as a duplicate of this bug. ***
*** Bug 165097 has been marked as a duplicate of this bug. ***
*** Bug 164880 has been marked as a duplicate of this bug. ***
Given that apparently there are still problems even with this patch in, and I've seen no reports of 1.0.x suffering from this (though it's entirely believable), I'm not going to give 1.0 branch approval yet. If any of the people who can reproduce this could try it with 1.0.1 and report here (good or bad) I'd _greatly_ appreciate it.
Since I was the one to originally open the bug I'll state that I have NOT had the browser to crash since the patch was checked in. I am able to access all of the sites that were causing my problems earlier with no problems. I am currently running 1.1 and not having any problems at all.
Um, what exactly is the evidence that bug 167403, bug 165220, bug 165097, and bug 164880 (the four recent duplicates) are in fact duplicates of this bug? For the most part, those bugs begin with "this crashed for me", followed by one or two comments by others of "WFM", and then "this bug has been marked...".
I checked into that, too. The common evidence in all of them seems to be the memory address 0x39393929. Don't know if that's enough evidence, but the "WFM" results are also typical of this particular crash: it's machine-dependent. Resummarizing to provide the memory address shown in the typical Windows alertbox for the crash: 0x39393929. See Comment #22 for the meaning of this address. Changing summary from "Browser crashes viewing web page" to "Large integers, e.g. getTime(), causing crash at 0x39393929" Also adding "Windows only" to the Status Whiteboard, as that seems to be the case in this bug and in every bug duped against it -
Summary: Browser crashes viewing web page → Large integers, e.g. getTime(), causing crash at 0x39393929
Whiteboard: [Related: bug 140852? ] → [Windows-only] [Related: bug 140852? ]
Yes, the memory address listed is pretty good evidence. All but bug 165220 made reference to the same memory address, 0x3939393? which is trademark of this bug. Bug 165220 might be something else, hard to know for sure.
I isolated the jsdtoa.c routine and performed a dtoa (double to ascii) conversions on some large numbers, 86400000 and 819187200000. I set the rounding direction to all four possible settings. Everything ran as expected. I then set the precision control to float (24 bits of mantissa). 86400000.0 ran as expected but 819187200000.0 crashed. It might be useful to write a small program that sets the fpu rounding precision to float and have it running while testing the JavaScript engine. It might also be useful to test for the correct rounding precision at the beginning of each call to jsdtoa (and exit with the appropriate error message if the rounding precision is corrupt.) Some graphics program may be reducing the rounding precision for performance.
But any decent OS should not let another program's FPU settings mess over Mozilla's. Is the problem that some embedding, or some plugin, is calling into in-process code that messes with the FPU? We don't want the overhead of setting FPU control/status registers on every dtoa, of course. /be
There are two issues. One which the patch addresses is the mantissa precision. When initializing COM, some deep function called by it sets it to the higher 64 bit precision. This only happens when certain programs are running. (I have no idea why). The patch set the precision in the JS numerics initialization, which addressed this problem. The second issue, appears to be a compiler issue. Even with the precision properly set some people still experienced the problem. Bug 140852 deals more with this second issue. Basically it looks like certain versions of the VC++ compiler optimize the floating point instructions in such a way that the error compounds and creates problems during the conversion from double to string. Resulting in a colon appearing in the translated number. This last issue, I've been able to reproduce on my system in isolation (putting just the jsdtoa code in a small program of its own), but Waldemar was unable to reproduce this with the program I created. /Op (Chooses more precise floating point math over speed) fixes the problem on my system. But we were trying to figure out the exact source of the problem to determine if the /Op option was really the correct fix or masking another problem.
If this does at least wallpaper the bug for many users (safely), we'll want it for the 1.0 branch. Does it do so?
In my opinion the patch is simple, safe, and correct regardless of the outcome of bug 140852. I just don't know how many people it will help.
Keywords: stackwanted
*** Bug 168664 has been marked as a duplicate of this bug. ***
Here's where my C2.dll came from. http://msdn.microsoft.com/vstudio/downloads/tools/ppack/default.asp It's the VC++ 6.0 Process Pack. So I guess some of the builds machines had this as well. I'm still uncertain if this is a "bug" in this VC++ update, or a bad assumption on the part the jsdtoa implementation. Also I expect we may see this same problem with VC++ 7.0, not that it is an issue at the moment.
David, Might it be useful to put some asserts in the source to check for corrupt precision settings?
It wouldn't hurt, but I'm not sure it's going to do a lot of good. Unless Mozilla developers run these types of programs while running debug version of Mozilla it's not going to be that preventative. Right now with the patch in place, this takes care of the JS Pager issue, and probably other programs that cause the setting of precision within the call to CoInitialize. So outside of the C2.dll issue, unless a developer happens across another program, service, OS version, etc. that might effect this post CoInitialize we're probably not going to see it till the users come across some odd program or OS version that trips it up. And once we detect it, what do we do? Add yet another program to list of programs not to run Mozilla with? We know of several potential solutions, and I think we need to choose one, and then incorporate the asssert into that check-in. I just don't know enough about floating point math to know if the routine is making an errant assumption about the way FPU is doing the calcs or that this is a bug in this version of the compiler. I'm fine with any of these. I think we need to pick one and run with it. Since this doesn't seem to be generating talkbacks this could be a bigger problem than we realize. 1. build with the earlier C2.dll (Can we detect this using the preprocessor) May run into this again in VC++ 7.0 Other people may have this DLL and build and encounter the problem 2. Use the /Op option May be a performance impact (Could compile only jsdtoa.c) 3. Modify the js_dtoa code We may be coding around a VC++ specific
I agree with David. I'd probably pick option 3, setting the precison control to the default on every entry to dtoa in VC++.
David, your comment #126 looks cut-off. I don't think we should fiddle with FPU registers on every dtoa, or JS numeric performance will regress unacceptably. /be
The time spent in dtoa setting the fpu environment would be negligible compared with the time spent in a call to dtoa. The call would only be made before any conversion between binary and decimal. If speed were a concern we could simply test the environment and do nothing if correct, otherwise set the correct environment and issue a warning that a potentially bad environment was encountered.
khanson: I hope you're right, but measurements would be good to prove negligible performance hit. Too many times a seeming small change can prove troublesome, so we should benchmark real-world and worst-case synthetic cases. If reading the status/control register (or whatever it is) is faster than setting unconditionally, do read before a conditional write. /be
This is the assembler and diff with and without the /Op I haven't had a chance to track down an old C2.dll to compare the assembler generated there. It wasn't in the sp5, only c2.exe. So I suspect it's in sp4 or sp3 or before. Adding _controlfp to the js_dtoa call isn't going to fix the problem we're seeing with this specific c2.dll issue. That has more to do with what's happening in bug 140852. Comment #102 was the last one concerning this particular bug's crash. I don't know if Marcia ever found if a specific program was causing a problem or not. I think we need to keep discussions about the 0x3939393? crash in this bug, and the : appearing in numbers in 140852. From what I see, there's no solid evidence saying we're seeing the FPU state getting changed after the JS numerics is initialized. Also I did a timing test and the js_dtoa function takes about 512 nanoseconds for the test number. With _controlfp added, that added another 16 nanoseconds, or almost 10% to the function. The numbers seem small, but 10% doesn't seem that small. /Op increased the js_dtoa time to 527 nanoseconds or an increase of 25 nanoseconds. I'm more concerned about bug 140852, even though it doesn't cause a crash.
> I don't know if Marcia ever found if a specific program > was causing a problem or not. I was not able to find if there's one specific application that causes the problem. I use david's /Op build .dll and it works fine.
I think the following is the key section of code: ; 1656 : L = (Long) (d / ds); - fld ST(0) + fld1 fdiv QWORD PTR _ds$[ebp] + fst QWORD PTR -76+[ebp] + fmul ST(0), ST(1) call __ftol mov DWORD PTR _L$[ebp], eax @@ -1463,42 +1527,42 @@ ; 1666 : if (i == ilim) { - cmp edi, 1 - mov BYTE PTR [ebx], al + cmp ebx, 1 + mov BYTE PTR [esi], al fmul QWORD PTR _ds$[ebp] - lea esi, DWORD PTR [ebx+1] + lea edi, DWORD PTR [esi+1] fsubp ST(1), ST(0) - je SHORT $L1556 - mov eax, esi - sub eax, ebx + je SHORT $L1993 + mov eax, edi + sub eax, esi mov DWORD PTR 16+[ebp], eax -$L1370: +$L1804: ; 1677 : } ; 1678 : break; ; 1679 : } ; 1680 : if (!(d *= 10.)) - fmul QWORD PTR __real@8@4002a000000000000000 + fmul QWORD PTR __real@4024000000000000 fld ST(0) - fcomp QWORD PTR __real@8@00000000000000000000 + fcomp QWORD PTR __real@0000000000000000 fnstsw ax - sahf - je $L1614 - fld ST(0) - fdiv QWORD PTR _ds$[ebp] + test ah, 68 ; 00000044H + jnp $L2052 + fld QWORD PTR -76+[ebp] + fmul ST(0), ST(1) call __ftol mov DWORD PTR _L$[ebp], eax fild DWORD PTR _L$[ebp] add al, 48 ; 00000030H - mov BYTE PTR [esi], al - inc esi + mov BYTE PTR [edi], al + inc edi fmul QWORD PTR _ds$[ebp] inc DWORD PTR 16+[ebp] - cmp DWORD PTR 16+[ebp], edi + cmp DWORD PTR 16+[ebp], ebx fsubp ST(1), ST(0) - jne SHORT $L1370 -$L1556: + jne SHORT $L1804 +$L1993: ; 1667 : d += d;
The L = (long) (d / ds); genereates nearly the same assembler using the old C2.dll and using the new one with /Op. The new version without /Op does a divide and then a multiply, while the others just do a divide. The assembler further down gets a bit more complex. So I think what's happening is that the new version of c2.dll takes some short cuts that normally doesn't effect things. The description of the option is at the link below. Reading the description, and how the FPU registers are used more rather than memory, and how that increases precision, sounds like this is probably the best solution IMO, since this routine relies on 53 bit precision. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/_core_.2f.Op.asp
So is there any concensus on what the appropriate solution is?
In setting up my laptop I also noticed that the Win32 build instructions call for the processor pack in addition to SP5. So I think we'll have to go the /Op route unless someone wants to tackle making the code to work without /Op.
*** Bug 165131 has been marked as a duplicate of this bug. ***
*** Bug 182624 has been marked as a duplicate of this bug. ***
*** Bug 186249 has been marked as a duplicate of this bug. ***
*** Bug 187704 has been marked as a duplicate of this bug. ***
More info on bug report 187704: URL only fails when Mozilla browser window opened by clicking URL http://www.chez.com/sfaucourt/mediat_us.htm: from an open message in a Mozilla Mail window. Then fails repeatedly. mynews@pacbell.net
I'm not sure if this is really the same bug since the IPF address is slightly different but I got here from bug #153402 which has been marked as a duplicate of this one. Shortly after printing the following occurs: MOZILLA caused an invalid page fault in module <unknown> at 0000:39393939. Registers: EAX=0064fb2c CS=0167 EIP=39393939 EFLGS=00010246 EBX=0064fb2c SS=016f ESP=00550038 EBP=00550058 ECX=005500dc DS=016f ESI=8167febc FS=19e7 EDX=bff76855 ES=016f EDI=00550104 GS=0000 Bytes at CS:EIP: Stack dump: bff76849 00550104 0064fb2c 00550120 005500dc 00550210 bff76855 0064fb2c 005500ec bff87fe9 00550104 0064fb2c 00550120 005500dc 39393939 005502c8 The only thing I have done other than print a page is switch to a different browser tab (often viewing news.com). I have been getting this bug since Mozilla v1.1 and it is still present. It occurs every time I print no matter what site is printed or what site I browse next as far as I can see. I'm running Win98 SE fully patched and using Mozilla 1.2.1 - Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.2.1) Gecko/20021130. OS doesn't appear to be any more unstable than usual for Win98. :)
Just a refresher, the 0x39393939 is an indicator of a buffer overrun of the ASCII 9's which is hex 0x39. It's possible that some other code did this, if there other code that converts floating point to ascii and doesn't check the buffer size. I know I've had this crash in testing, where the last number varied. It depended on the number that was being converted.
*** Bug 188021 has been marked as a duplicate of this bug. ***
The bug I described in 187704 has not occurred since installing Windows 2000 critical update described below. 810649: Critical Update This update contains several fixes to Windows components to better support default Web browsers other than Internet Explorer, as described in Microsoft Knowledge Base (KB) Article 810649. Download now to improve the interaction of certain Windows components with default web browsers other than Internet Explorer. For more information about this issue, read Microsoft KB Article: 810649. (This site may be in English.) System Requirements This update applies to Windows 2000 with Service Pack 3. I don't know if this is related to the bug or not.
There were two causes of this. One was the floating point precision changes on calls to CoInitialize, the other was various large values fed to the function caused rounding issues in the presence of compiler optimizations. It's possible you might be experiencing the CoInitialize flavor and this update might have "fixed" that.
*** Bug 180943 has been marked as a duplicate of this bug. ***
*** Bug 192002 has been marked as a duplicate of this bug. ***
*** Bug 189816 has been marked as a duplicate of this bug. ***
*** Bug 182789 has been marked as a duplicate of this bug. ***
*** Bug 190887 has been marked as a duplicate of this bug. ***
Note: this issue is now causing crashes for Windows users at http://slashdot.org, explaining the recent spate of duplicates -
What about David Bradley's fix (/Op for js*.dll) in bug 140852? It's known for almost 6 months, it works, without it my mozilla crashes twice a day. There're about 25 dupes for this...
Flags: blocking1.3?
Please also note Bug 180776 and Bug 185337, they're not necessarily dupes, though. There is an interesting problem with startrek.com as the crash is time-delayed i.e. I open the page and the browser crashes about a minute later. Also the symptoms (talkback not triggered) match as well. Also, the Windows XP "crash-catcher" doesn't get invoked, as well. I don't know what prerequisites exist for the "contact Microsoft"-code.
We should try to get this in for final. Gathering lots of dupes from slashdot crashers isn't good.
Flags: blocking1.3? → blocking1.3+
1.0.1 is gone.
Keywords: mozilla1.0.1
Patch was checked in for 140852, this bug can now be closed as well.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Provisionally marking Verified, as I am seeing no new test failures with the fix for bug 140852. However, from the beginning, I was never able to reproduce the current bug in the browser. Could other contributors report back on that? If you use today's trunk build of the browser, have the crashes gone away? From the reports, slashdot.org seemed to expose the crash; it would be nice to know from the field that this no longer occurs - Thanks - (remember to use a build from 2003-02-18 or after!)
Status: RESOLVED → VERIFIED
Can anyone confirm that this crash has gone away? Thanks -
I can verify it has been fixed. Win XP. Build ID: 200302022008.
I'd only crashed a couple of times at slashdot before the fix, but I can say that I haven't crashed at all with nightlies (on win2k) since the fix.
No crash since fix, all testcases from this bug and from #140544 and #140852 work just fine. Thanks.
Whiteboard: [Windows-only] [Related: bug 140852? ] → [Windows-only] [Related: bug 140852? ] fixed1.3
*** Bug 80734 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: