Crash @ dtoa.c when using Direct3D plugin - Regression for bug 358569

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
10 years ago
4 years ago

People

(Reporter: iacobcatalin, Unassigned)

Tracking

({crash})

Trunk
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090622 Minefield/3.6a1pre (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090622 Minefield/3.6a1pre (.NET CLR 3.5.30729)

This is a regression for Bug 358569 - Crash @ js_dtoa/jsdtoa.c when running with reduced CPU float precision, e.g. after loading Direct3D plugin. It is always reproducible on mozilla-central Mercurial tip 396f9743df7b from 22 June 2009 with our plugin that uses Direct3D. See bug 358569 for all details and fixes.

The reason for the regression is that the bug was fixed in jsdtoa.cpp but Mercurial commit a5fc387c4622 took the upstream dtoa.c and removed the dtoa code from jsdtoa.cpp. During the move the fix for this bug, and maybe more modifications were lost. Related to the same problem, some code was added to detect the situation in Debug builds and warn the plugin developers, see Bug 360282 - Warn about broken FPU. This was also lost in the above commit.

I suspect that this was also the cause for the Shockwave crash which I suppose placed dtoa in the table at https://wiki.mozilla.org/QA/Topcrashes since Shockwave can be set to use hardware acceleration via Direct3D. A relevant bug report may be: Bug 466659 - Crash on conquerorgame.com, possibly related to Bug 449118 , exception_access_violation [@ dtoa ]. If this one doesn't appear anymore it's probably because Adobe changed Shockwave to use the D3DCREATE_FPU_PRESERVE flag in the Direct3D CreateDevice function.

Reproducible: Always




about:buildconfig

Source

Built from http://hg.mozilla.org/mozilla-central/rev/396f9743df7b
Build platform
target
i686-pc-mingw32

Build tools
Compiler 	Version 	Compiler flags
cl 	14.00.50727.762 	-TC -nologo -W3 -Gy -Fdgenerated.pdb -DDEBUG -D_DEBUG -DDEBUG_catalin -DTRACING -Zi
cl 	14.00.50727.762 	-GR- -TP -nologo -Zc:wchar_t- -W3 -Gy -Fdgenerated.pdb -DDEBUG -D_DEBUG -DDEBUG_catalin -DTRACING -Zi

Configure arguments
--enable-application=browser --enable-debug --disable-optimize --with-windows-version=502

The crash is in js3250.dll.
Unfortunately the crash reporter doesn't kick in for me today so I can't submit a crash report but I will attach a stack trace.
so, afaiu, it's your job not to screw with the FPU mode, so if it's possible for you to fix your plugin not to mess with it, that'd be greatly appreciated.
I have recreated the two patches that were lost (fix crash and warn about wrong FPU mode) and attached them combined.

We can of course change our plugin, although according to MSDN - http://msdn.microsoft.com/en-us/library/bb172527(VS.85).aspx - the D3DCREATE_FPU_PRESERVE flag for Direct3D is not default because "Portions of Direct3D assume floating-point unit exceptions are masked; unmasking these exceptions may result in undefined behavior". Of course they don't say what those portions are and unlike Firefox we cannot debug Direct3D at source level so if we hit one of those undefined behavior situations now or in the future we'll be in a bad situation when trying to track the problem.

But regardless of our plugin, I think that the fix and warning should be in Firefox given that they were already accepted, got lost only by accident and they help plugin developers who hit the same issue realize where the problem lies. The current crash is also a (non critical indeed) security issue.
The crash is also fixed if Firefox is built with SSE or SSE2. Relevant part from about:buildconfig is: --enable-optimize=-O2 -arch:SSE and --enable-optimize=-O2 -arch:SSE2

It seems that in this case floating point operations are not done by the legacy x87 floating point unit but with SSE instructions instead.

Is there any chance that Firefox will be compiled in the "recent" future (for example in version 4.0) with SSE or SSE2 on Windows?
Assignee: nobody → general
Severity: normal → critical
Component: General → JavaScript Engine
Keywords: crash
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
(In reply to comment #4)
> The crash is also fixed if Firefox is built with SSE or SSE2. Relevant part
> from about:buildconfig is: --enable-optimize=-O2 -arch:SSE and
> --enable-optimize=-O2 -arch:SSE2
> 
> It seems that in this case floating point operations are not done by the legacy
> x87 floating point unit but with SSE instructions instead.
> 
> Is there any chance that Firefox will be compiled in the "recent" future (for
> example in version 4.0) with SSE or SSE2 on Windows?

Graydon, was there some recent discussion on SSE(2) optimizations?
I had an email exchange with David Gay, the author of dtoa.c, about whether it is possible for him to make dtoa.c also work with single precision. He said that instead I should be using http://www.netlib.org/fp/gdtoa.tgz which handles all precisions.

The README file in the archive above starts with "This directory contains source for a library of binary -> decimal and decimal -> binary conversion routines, for single-, double-, and extended-precision IEEE binary floating-point arithmetic". And a few lines below "The routines are generalizations of the strtod and dtoa routines".

Is there any reason why dtoa.c was used instead of gdtoa which seems more robust? Searching the bugtracker and with Google doesn't reveal anything.
I would lay good odds on gdtoa not existing back in 1994 or so when SpiderMonkey was first written, hence why it's not in use here.  The other possibility is that dtoa was chosen to reduce attack surface.
JS was created in May 1995, SpiderMonkey was a rewrite of the first so-called "Mocha" runtime done in fall 1996. No gdtoa.tgz then -- copyright and other date refs in its source start at 1998.

We're not looking to sink new costs for same functionality. We also do not aim to break ECMA-262 conformance by supporting single precision. Any other reason to pay the switching costs?

/be
(In reply to comment #8)
> We're not looking to sink new costs for same functionality. We also do not aim
> to break ECMA-262 conformance by supporting single precision. Any other reason
> to pay the switching costs?

Thanks for the explanation. I didn't know that the standard requires double precision but it does so you are right, there is no compelling reason to switch to gdtoa.

I have a question about the "blocking1.9.1" flag that was added to this bug. Does it mean that a fix for this has a chance of getting into Firefox 3.5.x? As far as I understand Firefox 3.5 corresponds to Mozilla 1.9.1 but I'm not sure.
We should submit our fix for bug 358569 to the author of
dtoa.c.  I wanted to do that earlier this year, but never
got around to it.
Do we still need to do anything here?  It looks sort of like bug 358569 caught what we wanted most out of it.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
(In reply to comment #11)
> Do we still need to do anything here?  It looks sort of like bug 358569 caught
> what we wanted most out of it.

I'm fine with closing the bug. We changed our plugin to not mess with FPU settings and it now works well on Firefox and I understand your reasons for not wanting plugin developers to change FPU settings.

The only thing that you might want to do, is check if the changes implemented for Bug 360282 - Warn about broken FPU are now in Firefox. My investigations back then showed that they were lost in commit a5fc387c4622. It would be nice if plugin developers could get a warning in Debug builds of Firefox.
This isn't worksforme.  This bug is all about the fact that the fix in bug 358569 was accidentally backed out when we switched to dtoa.c.  Comment 11 has that wrong, I think.

The fix for bug 360282 was also apparently backed out, making this sort of problem hard to diagnose....

Reopening.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: WORKSFORME → ---
What's the preferred solution, the change to the for loop in bug 358569, the warning for FPU precision in bug 360282, or both. The for loop change prevents the crash if, despite the warning in bug 360282, the FPU precision does get changed.

This brought back memories of bug 160602 when a VC++ 6 patch changed the FPU precision. And yes the crashes that result from this are often difficult to trace back.
We strongly consider doing both, imo...
Assignee: general → nobody
Both blocking bugs have been fixed.
Status: REOPENED → RESOLVED
Closed: 9 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.