Closed Bug 538642 Opened 15 years ago Closed 15 years ago

The FPU exception handler should work on Windows (conflicts with breakpad, bug 533035 not actually fixed)

Categories

(Toolkit :: Crash Reporting, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking1.9.2 --- .2+
status1.9.2 --- .2-fixed

People

(Reporter: benjamin, Assigned: benjamin)

References

Details

(Keywords: crash, topcrash, verified1.9.2, Whiteboard: [3.6.x][rc-ridealong][crashkill])

Attachments

(4 files)

Currently the FPU exception handler is installed before the breakpad one, which means (at least on Windows) that it just isn't getting run at all.

I can't just switch them, though, because while doing OOPP we discovered that the FPU exception handler doesn't chain correctly, so we'd break crash reporting.

In order to get this right, I need a test. 1) code to run in the testplugin which will set the FPU control words to a bad setting 2) JS code which will trigger a FPU exception
Flags: in-testsuite?
Comment on attachment 420781 [details] [diff] [review]
temporarily disable floating-point exception handler

Let's get this on m-c to unblock OOPP.
Attachment #420781 - Flags: review?(benjamin) → review+
Flags: wanted1.9.2+
Flags: blocking1.9.2-
Whiteboard: [3.6.x][rc-ridealong]
OS: Linux → Windows 7
Summary: The FPU exception handler should be installed after breakpad, and should chain correctly → The FPU exception handler should work on Windows (conflicts with breakpad, bug 533035 not actually fixed)
Switching the order in which the handlers are installed is in fact difficult, because the breakpad handler isn't installed until we read application.ini and the FPU handler is in a code block which needs to run right near the beginning of XRE_main. So instead I inserted a breakpad filter for the FPU exceptions.
Attachment #421075 - Flags: review?(ted.mielczarek)
Attachment #421076 - Flags: review?(ted.mielczarek)
Attachment #421077 - Flags: review?(ted.mielczarek)
Comment on attachment 421075 [details] [diff] [review]
Breakpad should skip FPU exceptions, rev. 1

As a result of this we'll never get crash reports for floating point exceptions, right? (I guess that was the point of bug 533035 anyway.)
Attachment #421075 - Flags: review?(ted.mielczarek) → review+
Yes, precisely.
Attachment #421076 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 421077 [details] [diff] [review]
Test for FPU exceptions, rev. 1

I assume that you've verified that this test fails without the previous patches.
Attachment #421077 - Flags: review?(ted.mielczarek) → review+
http://hg.mozilla.org/mozilla-central/rev/5d89972b2ca0
http://hg.mozilla.org/mozilla-central/rev/918a5af21732
http://hg.mozilla.org/mozilla-central/rev/f3f4c57e8262

Yes, the test failed before the fix.
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Attachment #421075 - Flags: approval1.9.2?
Attachment #421075 - Flags: approval1.9.2.1?
blocking1.9.2: --- → ?
status1.9.2: --- → ?
Flags: blocking1.9.2-
Keywords: crash, topcrash
Whiteboard: [3.6.x][rc-ridealong] → [3.6.x][rc-ridealong][crashkill]
blocking1.9.2: ? → .1+
Attachment #421075 - Flags: approval1.9.2?
We already shipped
blocking1.9.2: .1+ → .2+
Blocks: 537039
We should really get this in for 3.6.2; it fixes the fact that one of our top crash fixes didn't actually fix the crash it was supposed to be fixing.
Comment on attachment 421075 [details] [diff] [review]
Breakpad should skip FPU exceptions, rev. 1

a1922=beltzner, nice to see tests!
Attachment #421075 - Flags: approval1.9.2.2? → approval1.9.2.2+
dbaron offered to land this for me
Assignee: benjamin → dbaron
So, minor problem with that.  I did the merging to branch:  the patches are in my queue at 
http://hg.mozilla.org/users/dbaron_mozilla.com/patches-1.9.2/
except that we crash in the test added for this bug, right after printing:

109747 INFO TEST-PASS | /tests/toolkit/xre/test/test_fpuhandler.html | Undefined division-by-zero doesn't crash

I'll push another round to try with a printf to confirm that we hit the code in the testplugin (rather than that we're crashing in the testplugin).
(In reply to comment #15)
> I'll push another round to try with a printf to confirm that we hit the code in
> the testplugin (rather than that we're crashing in the testplugin).

I confirmed that we are hitting the code in the testplugin that changes the FPU state.  So I think it's that the patch just doesn't work when applied to 1.9.2.

That's more than I'm in a position to figure out right now.  Perhaps Ted could help?
Assignee: dbaron → benjamin
Is there any more info in the tinderbox log? I don't know of any obvious changes in the Breakpad Windows code between 1.9.2 and trunk.
So, that's bug 539530, which is also a problem on trunk. It seems that this is a problem if you run all the mochitests in one pass, but not a problem if you run them separated. I haven't recorded it to figure out what's going on.
Blocks: 539530
This is marked as blocking Firefox 3.6.2, and I wonder if given comment 15 through comment 18 we should be downgrading it to "needed", but I'm not sure how big a problem this is for crash analysis. Input/opinions?
its not so much a problem for crash analysis, since most of the analysis has been done.   It is a problem for users crashing.  js_Interpret was near the very top of top crashes for 3.6 betas before we made skiplist changes.  now we have a variety of signatures like https://bugzilla.mozilla.org/show_bug.cgi?id=537039  js_Interpret:904 and several more that rely on this fix to fix those crashes.

getting this final fix and enabling the other fixes to finally come into play will result in hundreds of fewer crashes per day.  I've venture to say this is the bigest bang for the buck we have on the crashkill hit list right now.
We're not currently running split mochitests on 1.9.2 or 1.9.1, so this test would fail if we landed it on branches. We could land bug 494165 on branches (it's just test harness changes) and RelEng could easily flip on split mochitests there.
OK, the patch is approved but a test would break. Code freeze for 1.9.2.2 is coming up - what do we need to do to get this landed? Should I lean on RelEng for bug 494165? If so, can we add that dependency?
So you think the patch still works on branch, it's just the test that's not working?  If so, my inclination would be to land with the test disabled.
Yes, I think it will work and that landing without the test is probably the best course of action for now.
I have a TODO to get the necessary patch on branches so we can enable split mochitests, and make the test not fail. Please land the test file on branch, but don't enable it in the makefile.
Let's do it. Code freeze is 11:59pm March 9th!
Blocks: 540959
Can we test the beta builds to make sure that this is now sending crashes, or at least not sending fewer crashes? Or has it already been tested?
is there a way to test this in the 3.6.2 beta build ?
You need the testplugin. You can use the one from trunk builds by downloading and unpacking http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-3.7a4pre.en-US.win32.tests.tar.bz2. The testplugin is in bin/plugins/nptest.dll and you can copy it into {profiledir}/plugins

Then visit http://benjamin.smedbergs.us/tests/test-fpuhandler.html and click "Run Test"
hm i did the steps from comment#30 but it says i have not the plugin installed, while the plugin manager says i have :/ http://img260.imageshack.us/img260/9933/screenshot20100322at103.png
np, thanks Benjamin !

Result from the test on the testpage : All tests passed

with:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2

verified for 1.9.2-2
Keywords: verified1.9.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: