587406 - Exceptions on windows 64 are not fatal

Reporter

Description

•

14 years ago

Basically all exceptions on WOW64 (Windows 32 bit applications running on Windows 64) are non-fatal. This means that when we try to fire a _debugbreak() in our OOM handler, instead of breaking, WOW64 eats the exception and returns us to the event loop. This means that when we dereference a null pointer, Windows 64 returns us to the event loop. We are left in an utterly invalid state, unable to draw or progress any further, left spinning in the event loop in a futile attempt to get any work done. This is bad.

Joe Drew (not getting mail)

Reporter

Updated

•

14 years ago

blocking2.0: --- → beta4+

Joe Drew (not getting mail)

Reporter

Comment 1

•

14 years ago

This, I believe, only happens when we call from our code back into Windows code, back into our code. Mozilla(1) -> Windows -> Mozilla(2) When Mozilla(2) throws an exception, for example by running out of memory and intentionally crashing, Windows catches it and silently unwinds back out, leaving us (most often) inside the event loop.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 2

•

14 years ago

This has been bothering me for a long time, and it now makes more sense that WOW64 is involved. http://www.nynaeve.net/?p=202 might be useful (mention of KiUserApcDispatcher providing a SEH frame, and that dispatcher being involved in kernel->user dispatch). Also another reference mentions that vectored exception handling overwrites structured, so that might also be related. It's probably worth trying to wrap our main() in a SEH handler and see what happens.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 3

•

14 years ago

I should say -- I don't know that this needs to block beta4. It's been happening for a while now, maybe even on previous versions of firefox, and is only becoming an issue because more and more people are running 64-bit versions of windows. But it should block beta5.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

14 years ago

OS: Mac OS X → Windows 7

Wayne Mery (:wsmwk)

Updated

•

14 years ago

Severity: normal → critical

Mike Beltzner [:beltzner, not reading bugmail]

Comment 4

•

14 years ago

Agreed with comment 3

blocking2.0: beta4+ → beta5+

Ryan Jones-Ward [:sciguyryan]

Comment 6

•

14 years ago

(In reply to comment #3) > I should say -- I don't know that this needs to block beta4. It's been > happening for a while now, maybe even on previous versions of firefox, and is > only becoming an issue because more and more people are running 64-bit versions > of windows. But it should block beta5. If you guys are planning on turning on D2D by default then this needs to be fixed soon as it is leading to quite a common issue among testers see Bug 586909 for example. This kind of issue makes it difficult to test D2D in the short term and since, as I understand it, the whole point of enabling D2D now was to try and get some user coverage this is going to hurt that.

Jeff Muizelaar [:jrmuizel]

Comment 7

•

14 years ago

I can't reproduce this with CrashMe on Win64

Jeff Muizelaar [:jrmuizel]

Comment 8

•

14 years ago

I wonder if this only happens on crashes that happen during paint

(not currently active) Ted Mielczarek

Comment 9

•

14 years ago

Comment 1 implies that it only happens when we're being called back by Windows, so inside a WM_PAINT seems likely. (CrashMe is just triggered off a menu item, which is serviced via the event loop, but probably not via a Windows message).

(no longer active)

Comment 10

•

14 years ago

This behavior is documented in MSDN: http://msdn.microsoft.com/en-us/library/ms633573%28VS.85%29.aspx It appears to me that we need to hook up our crash reporter with our window proc(s), and just invoke it and terminate the process in case of a fatal exception.

(not currently active) Ted Mielczarek

Comment 11

•

14 years ago

We use SEH around a11y calls currently: http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#181 http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584

Jeff Muizelaar [:jrmuizel]

Comment 12

•

14 years ago

I think this should probably block enabling D2D by default, otherwise we stand to miss out on bunching of crash feedback and giving people a bad experience at the same time.

(no longer active)

Comment 13

•

14 years ago

(In reply to comment #11) > We use SEH around a11y calls currently: > http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#181 > http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584 This seems exactly like what we need to be doing here: http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584 Except that we should not return EXCEPTION_CONTINUE_SEARCH I think. I think the right thing to do here is to call TerminateProcess instead (provided that the crash reporter can handle that!)

Benjamin Smedberg

Assignee

Updated

•

14 years ago

Assignee: nobody → benjamin

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 14

•

14 years ago

That's pretty ridiculous. But, we need to test -- if we add an exception handler and then TerminateProcess, we'd still like to know what the actual crash was. Can we somehow extend breakpad to pass it to the exception structure? Otherwise, this is just going to be a black hole "something crashed in the WndProc" type of report, since we won't have a stack or anything.

Benjamin Smedberg

Assignee

Comment 15

•

14 years ago

Attached patch Report exceptions from nsWindow::WindowProc, rev. 1 — Details — Splinter Review

Attachment #466370 - Flags: review?(ted.mielczarek)

Attachment #466370 - Flags: review?(jmuizelaar)

(not currently active) Ted Mielczarek

Comment 16

•

14 years ago

vlad: the examples I linked to (and bsmedberg's patch) call nsICrashReporter::WriteMinidumpForException, which takes an EXCEPTION_POINTERS* argument, which we get from the SEH catch block, so crash reports will show up with the crashing thread stack shown from wherever the original crash was.

(not currently active) Ted Mielczarek

Comment 17

•

14 years ago

Comment on attachment 466370 [details] [diff] [review] Report exceptions from nsWindow::WindowProc, rev. 1 Looks sensible. If we could figure out a reliable way to test this (even as a Litmus test) that would be awesome.

Attachment #466370 - Flags: review?(ted.mielczarek) → review+

(no longer active)

Comment 18

•

14 years ago

Comment on attachment 466370 [details] [diff] [review] Report exceptions from nsWindow::WindowProc, rev. 1 This doesn't seem quite like the right approach to me. Why are we converting all of the possible exception codes into crashes?

Benjamin Smedberg

Assignee

Comment 19

•

14 years ago

Isn't everything that __except catches a crash if you don't catch it? Why wouldn't we catch all of them?

Jeff Muizelaar [:jrmuizel]

Comment 20

•

14 years ago

This seems to work for me. At least it quickly exited firefox instead of partially starting up. I'm building with crash reporter enabled right now to make sure that works.

(no longer active)

Comment 21

•

14 years ago

(In reply to comment #19) > Isn't everything that __except catches a crash if you don't catch it? Why > wouldn't we catch all of them? Yes. Well, I guess we could go with this patch as well, since the MSDN docs do not seem to suggest that any kind of filtering happens by Windows. I would assume that our crash reporting infrastructure is capable of handling different exception codes...

Benjamin Smedberg

Assignee

Comment 22

•

14 years ago

Indeed, we do.

Bas Schouten (:bas.schouten)

Comment 23

•

14 years ago

(In reply to comment #19) > Isn't everything that __except catches a crash if you don't catch it? Why > wouldn't we catch all of them? Right, if the SEH frames below us don't catch it I'd say we need to terminate. I'm guessing noone's planning on catching exceptions -through- the Win32 message layer. I think this fix is exactly what we need and it's roughly what I suggested to release-drivers. So I'm all for this.

(no longer active)

Comment 24

•

14 years ago

(In reply to comment #23) > (In reply to comment #19) > > Isn't everything that __except catches a crash if you don't catch it? Why > > wouldn't we catch all of them? > > Right, if the SEH frames below us don't catch it I'd say we need to terminate. > I'm guessing noone's planning on catching exceptions -through- the Win32 > message layer. I think this fix is exactly what we need and it's roughly what I > suggested to release-drivers. So I'm all for this. FWIW, I don't agree with this in the general sense, it's always possible to have a global handler to handle some benign exceptions (or even ignore them), and it is possible to pass custom exception codes to RaiseException. But in *this* case, none of this matters since Windows will be eating the exceptions up if we don't do it!

Benjamin Smedberg

Assignee

Comment 25

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/ed9357ce0939 http://hg.mozilla.org/mozilla-central/rev/a2c829756d99

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Benjamin Smedberg

Assignee

Updated

•

14 years ago

Attachment #466370 - Flags: review?(jmuizelaar) → approval1.9.2.10?

neil@parkwaycc.co.uk

Comment 26

•

14 years ago

(In reply to comment #14) > That's pretty ridiculous. Agreed. That pretty much breaks just-in-time debugging too.

Benjamin Smedberg

Assignee

Comment 27

•

14 years ago

Does the debugger catch these on Windows 64, or are they eaten? I'm happy to #ifdef this code so it doesn't run in debug builds, or checks whether a debugger is attached, if that's possible.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 28

•

14 years ago

It never caught it for me. You can probably do it if you manually configure breaking on caught exceptions or something (since as far as the debugger knows, the exception was handled, it never propagated back up).

(no longer active)

Comment 29

•

14 years ago

(In reply to comment #27) > Does the debugger catch these on Windows 64, or are they eaten? I'm happy to > #ifdef this code so it doesn't run in debug builds, or checks whether a > debugger is attached, if that's possible. You can call IsDebuggerPresent(), and if it returns TRUE, call DebugBreak() maybe?

timeless

Comment 30

•

14 years ago

bustage fix: http://hg.mozilla.org/mozilla-central/rev/48487f904abe

neil@parkwaycc.co.uk

Comment 31

•

14 years ago

This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an assertion, expecting it to invoke the JIT debugger :-(

Myzar

Comment 32

•

14 years ago

Is it supposed to catch the crashes from https://bugzilla.mozilla.org/show_bug.cgi?id=586909 now ? i just crashed the usual way with mozalloc.dll with this build Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100817 Minefield/4.0b4pre Firefox/3.6.7 and breakpad didn't fire, is there some auto crashme test just to check that breakpad is functional on my box

(no longer active)

Comment 33

•

14 years ago

(In reply to comment #31) > This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an > assertion, expecting it to invoke the JIT debugger :-( Benjamin, can we only look at access violations here?

Benjamin Smedberg

Assignee

Comment 34

•

14 years ago

No, we are not going to look at only access violations, that doesn't make sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler.

Jeff Muizelaar [:jrmuizel]

Comment 35

•

14 years ago

(In reply to comment #34) > No, we are not going to look at only access violations, that doesn't make > sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler. We could also let the exception pass up through WindowProc on platforms that support it like 32bit on 32bit.

Jacek Caban

Updated

•

14 years ago

Depends on: 588383

Benjamin Smedberg

Assignee

Comment 36

•

14 years ago

Attached patch IsDebuggerPresent check, rev. 1 — Details — Splinter Review

Attachment #467013 - Flags: review?(ted.mielczarek)

Attachment #467013 - Flags: review?(neil)

(not currently active) Ted Mielczarek

Updated

•

14 years ago

Attachment #467013 - Flags: review?(ted.mielczarek) → review+

neil@parkwaycc.co.uk

Comment 37

•

14 years ago

Comment on attachment 467013 [details] [diff] [review] IsDebuggerPresent check, rev. 1 This doesn't seem to have any effect at all. When the debugger isn't attached, the process still terminates. (This is true even if I remove the IsDebuggerPresent check!) When the debugger is attached, it breaks at the first-chance exception anyway.

Attachment #467013 - Flags: review?(neil) → review-

neil@parkwaycc.co.uk

Comment 38

•

14 years ago

Attached patch What I'm using locally — Details — Splinter Review

Bas Schouten (:bas.schouten)

Comment 39

•

14 years ago

(In reply to comment #37) > Comment on attachment 467013 [details] [diff] [review] > IsDebuggerPresent check, rev. 1 > > This doesn't seem to have any effect at all. > When the debugger isn't attached, the process still terminates. (This is true > even if I remove the IsDebuggerPresent check!) > When the debugger is attached, it breaks at the first-chance exception anyway. Often debuggers are not set to break at first-chance exceptions. MSVC's default scheme for example only breaks at unhandled exceptions.

Benjamin Smedberg

Assignee

Comment 40

•

14 years ago

Attached patch Use FatalExit to enter the debugger, rev. 1 — Details — Splinter Review

Attachment #466370 - Attachment is obsolete: true

Attachment #467119 - Flags: review?(neil)

Attachment #466370 - Flags: approval1.9.2.10?

Benjamin Smedberg

Assignee

Updated

•

14 years ago

Attachment #466370 - Attachment is obsolete: false

Benjamin Smedberg

Assignee

Updated

•

14 years ago

Attachment #467013 - Attachment is obsolete: true

neil@parkwaycc.co.uk

Comment 41

•

14 years ago

Comment on attachment 467119 [details] [diff] [review] Use FatalExit to enter the debugger, rev. 1 So, given comment #39, DebugBreak is less worse than FatalExit.

Attachment #467119 - Flags: review?(neil) → review-

neil@parkwaycc.co.uk

Updated

•

14 years ago

Attachment #467013 - Attachment is obsolete: false

Attachment #467013 - Flags: review- → review+

Masatoshi Kimura [:emk]

Comment 42

•

14 years ago

What about using UnhandledExceptionFilter? If the debugger is attached, it will return EXCEPTION_CONTINUE_SEARCH immediately. If the debugger is not attached, it will call user-defined filter which will invoke our crash reporter. If the debugger is not attached and the user-defined filter doesn't handle the exception, it will invoke the JIT debugger before returning EXCEPTION_CONTINUE_SEARCH. So we can DebugBreak if the function returns EXCEPTION_CONTINUE_SEARCH. Here's a pseudocode of UnHandledExceptionFilter. https://www.microsoft.com/msj/0197/exception/exceptiontextfigs.htm#fig13

Joe Drew (not getting mail)

Reporter

Comment 43

•

14 years ago

This really seems as though it hasn't been totally fixed, as people are still seeing a lot of uncrashing crashes on win64. Also, interestingly: http://support.microsoft.com/kb/976038

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Joe Drew (not getting mail)

Reporter

Updated

•

14 years ago

Depends on: 589495

Joe Drew (not getting mail)

Reporter

Comment 44

•

14 years ago

Gavin asked that I not reopen this despite it remaining unfixed. I have filed bug 589495 on finishing the fixing of this bug.

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

Report exceptions from nsWindow::WindowProc, rev. 1 14 years ago Benjamin Smedberg 2.60 KB, patch	ted : review+	Details \| Diff \| Splinter Review
IsDebuggerPresent check, rev. 1 14 years ago Benjamin Smedberg 805 bytes, patch	ted : review+ neil : review+	Details \| Diff \| Splinter Review
What I'm using locally 14 years ago neil@parkwaycc.co.uk 2.23 KB, patch		Details \| Diff \| Splinter Review
Use FatalExit to enter the debugger, rev. 1 14 years ago Benjamin Smedberg 1.00 KB, patch	neil : review-	Details \| Diff \| Splinter Review