The FPU exception handler should work on Windows (conflicts with breakpad, bug 533035 not actually fixed)

RESOLVED FIXED

Status

()

Toolkit
Crash Reporting
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: Benjamin Smedberg, Assigned: Benjamin Smedberg)

Tracking

({crash, topcrash, verified1.9.2})

unspecified
x86
Windows 7
crash, topcrash, verified1.9.2
Points:
---
Dependency tree / graph
Bug Flags:
wanted1.9.2 +
in-testsuite +

Firefox Tracking Flags

(blocking1.9.2 .2+, status1.9.2 .2-fixed)

Details

(Whiteboard: [3.6.x][rc-ridealong][crashkill])

Attachments

(4 attachments)

(Assignee)

Description

8 years ago
Currently the FPU exception handler is installed before the breakpad one, which means (at least on Windows) that it just isn't getting run at all.

I can't just switch them, though, because while doing OOPP we discovered that the FPU exception handler doesn't chain correctly, so we'd break crash reporting.

In order to get this right, I need a test. 1) code to run in the testplugin which will set the FPU control words to a bad setting 2) JS code which will trigger a FPU exception
Flags: in-testsuite?
Created attachment 420781 [details] [diff] [review]
temporarily disable floating-point exception handler
Attachment #420781 - Flags: review?(benjamin)
(Assignee)

Comment 2

8 years ago
Comment on attachment 420781 [details] [diff] [review]
temporarily disable floating-point exception handler

Let's get this on m-c to unblock OOPP.
Attachment #420781 - Flags: review?(benjamin) → review+
Pushed http://hg.mozilla.org/mozilla-central/rev/256c51e15d67
Flags: wanted1.9.2+
Flags: blocking1.9.2-
Whiteboard: [3.6.x][rc-ridealong]
(Assignee)

Updated

8 years ago
OS: Linux → Windows 7
Summary: The FPU exception handler should be installed after breakpad, and should chain correctly → The FPU exception handler should work on Windows (conflicts with breakpad, bug 533035 not actually fixed)
(Assignee)

Comment 4

8 years ago
Created attachment 421075 [details] [diff] [review]
Breakpad should skip FPU exceptions, rev. 1

Switching the order in which the handlers are installed is in fact difficult, because the breakpad handler isn't installed until we read application.ini and the FPU handler is in a code block which needs to run right near the beginning of XRE_main. So instead I inserted a breakpad filter for the FPU exceptions.
Attachment #421075 - Flags: review?(ted.mielczarek)
(Assignee)

Comment 5

8 years ago
Created attachment 421076 [details] [diff] [review]
testplugin function to enable FPU exceptions, rev. 1
Attachment #421076 - Flags: review?(ted.mielczarek)
(Assignee)

Comment 6

8 years ago
Created attachment 421077 [details] [diff] [review]
Test for FPU exceptions, rev. 1
Attachment #421077 - Flags: review?(ted.mielczarek)
Comment on attachment 421075 [details] [diff] [review]
Breakpad should skip FPU exceptions, rev. 1

As a result of this we'll never get crash reports for floating point exceptions, right? (I guess that was the point of bug 533035 anyway.)
Attachment #421075 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Comment 8

8 years ago
Yes, precisely.
Attachment #421076 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 421077 [details] [diff] [review]
Test for FPU exceptions, rev. 1

I assume that you've verified that this test fails without the previous patches.
Attachment #421077 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Comment 10

8 years ago
http://hg.mozilla.org/mozilla-central/rev/5d89972b2ca0
http://hg.mozilla.org/mozilla-central/rev/918a5af21732
http://hg.mozilla.org/mozilla-central/rev/f3f4c57e8262

Yes, the test failed before the fix.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
(Assignee)

Updated

8 years ago
Attachment #421075 - Flags: approval1.9.2?
Attachment #421075 - Flags: approval1.9.2.1?

Updated

8 years ago
blocking1.9.2: --- → ?
status1.9.2: --- → ?
Flags: blocking1.9.2-
Keywords: crash, topcrash
Whiteboard: [3.6.x][rc-ridealong] → [3.6.x][rc-ridealong][crashkill]
blocking1.9.2: ? → .1+
status1.9.2: ? → wanted
Attachment #421075 - Flags: approval1.9.2?
We already shipped
blocking1.9.2: .1+ → .2+

Updated

8 years ago
Blocks: 537039
We should really get this in for 3.6.2; it fixes the fact that one of our top crash fixes didn't actually fix the crash it was supposed to be fixing.
Comment on attachment 421075 [details] [diff] [review]
Breakpad should skip FPU exceptions, rev. 1

a1922=beltzner, nice to see tests!
Attachment #421075 - Flags: approval1.9.2.2? → approval1.9.2.2+
(Assignee)

Comment 14

8 years ago
dbaron offered to land this for me
Assignee: benjamin → dbaron
So, minor problem with that.  I did the merging to branch:  the patches are in my queue at 
http://hg.mozilla.org/users/dbaron_mozilla.com/patches-1.9.2/
except that we crash in the test added for this bug, right after printing:

109747 INFO TEST-PASS | /tests/toolkit/xre/test/test_fpuhandler.html | Undefined division-by-zero doesn't crash

I'll push another round to try with a printf to confirm that we hit the code in the testplugin (rather than that we're crashing in the testplugin).
(In reply to comment #15)
> I'll push another round to try with a printf to confirm that we hit the code in
> the testplugin (rather than that we're crashing in the testplugin).

I confirmed that we are hitting the code in the testplugin that changes the FPU state.  So I think it's that the patch just doesn't work when applied to 1.9.2.

That's more than I'm in a position to figure out right now.  Perhaps Ted could help?
Assignee: dbaron → benjamin
Is there any more info in the tinderbox log? I don't know of any obvious changes in the Breakpad Windows code between 1.9.2 and trunk.
(Assignee)

Comment 18

8 years ago
So, that's bug 539530, which is also a problem on trunk. It seems that this is a problem if you run all the mochitests in one pass, but not a problem if you run them separated. I haven't recorded it to figure out what's going on.
Blocks: 539530
This is marked as blocking Firefox 3.6.2, and I wonder if given comment 15 through comment 18 we should be downgrading it to "needed", but I'm not sure how big a problem this is for crash analysis. Input/opinions?

Comment 20

8 years ago
its not so much a problem for crash analysis, since most of the analysis has been done.   It is a problem for users crashing.  js_Interpret was near the very top of top crashes for 3.6 betas before we made skiplist changes.  now we have a variety of signatures like https://bugzilla.mozilla.org/show_bug.cgi?id=537039  js_Interpret:904 and several more that rely on this fix to fix those crashes.

getting this final fix and enabling the other fixes to finally come into play will result in hundreds of fewer crashes per day.  I've venture to say this is the bigest bang for the buck we have on the crashkill hit list right now.
We're not currently running split mochitests on 1.9.2 or 1.9.1, so this test would fail if we landed it on branches. We could land bug 494165 on branches (it's just test harness changes) and RelEng could easily flip on split mochitests there.
OK, the patch is approved but a test would break. Code freeze for 1.9.2.2 is coming up - what do we need to do to get this landed? Should I lean on RelEng for bug 494165? If so, can we add that dependency?
So you think the patch still works on branch, it's just the test that's not working?  If so, my inclination would be to land with the test disabled.
(Assignee)

Comment 24

8 years ago
Yes, I think it will work and that landing without the test is probably the best course of action for now.
I have a TODO to get the necessary patch on branches so we can enable split mochitests, and make the test not fail. Please land the test file on branch, but don't enable it in the makefile.
Let's do it. Code freeze is 11:59pm March 9th!
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ecd0c04f85ad
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4579a9920df3
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/844835915e22
status1.9.2: wanted → .2-fixed
Blocks: 540959
Can we test the beta builds to make sure that this is now sending crashes, or at least not sending fewer crashes? Or has it already been tested?
is there a way to test this in the 3.6.2 beta build ?
(Assignee)

Comment 30

8 years ago
You need the testplugin. You can use the one from trunk builds by downloading and unpacking http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-3.7a4pre.en-US.win32.tests.tar.bz2. The testplugin is in bin/plugins/nptest.dll and you can copy it into {profiledir}/plugins

Then visit http://benjamin.smedbergs.us/tests/test-fpuhandler.html and click "Run Test"
hm i did the steps from comment#30 but it says i have not the plugin installed, while the plugin manager says i have :/ http://img260.imageshack.us/img260/9933/screenshot20100322at103.png
(Assignee)

Comment 32

8 years ago
Hrm, linkage changed a little bit on trunk. Use http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.2/firefox-3.6.3pre.en-US.win32.tests.tar.bz2 instead. Sorry!
np, thanks Benjamin !

Result from the test on the testpage : All tests passed

with:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2

verified for 1.9.2-2
Keywords: verified1.9.2

Comment 34

8 years ago
no 3.6.2 crashes with the isPromoteInt so far

http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6.2&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=isPromoteInt&build_id=&process_type=all&do_query=1

no js_Interpret:912 crashes as well

but there are a couple js_Interpret:916 and 904 crashes, but volume appears to be way down on those.

http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6.2&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=js_Interpret%3A9&build_id=&process_type=all&do_query=1

so the effects of the other patches and this patch seem to be taking hold.

Updated

8 years ago
Duplicate of this bug: 532141
Depends on: 563662
You need to log in before you can comment on or make changes to this bug.