Exceptions on windows 64 are not fatal

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: joe, Assigned: benjamin)

Tracking

Trunk
x86
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 beta5+)

Details

Attachments

(4 attachments)

Basically all exceptions on WOW64 (Windows 32 bit applications running on Windows 64) are non-fatal. This means that when we try to fire a _debugbreak() in our OOM handler, instead of breaking, WOW64 eats the exception and returns us to the event loop. This means that when we dereference a null pointer, Windows 64 returns us to the event loop. We are left in an utterly invalid state, unable to draw or progress any further, left spinning in the event loop in a futile attempt to get any work done.

This is bad.
blocking2.0: --- → beta4+
This, I believe, only happens when we call from our code back into Windows code, back into our code. 

Mozilla(1) -> Windows -> Mozilla(2)

When Mozilla(2) throws an exception, for example by running out of memory and intentionally crashing, Windows catches it and silently unwinds back out, leaving us (most often) inside the event loop.
This has been bothering me for a long time, and it now makes more sense that WOW64 is involved.  http://www.nynaeve.net/?p=202 might be useful (mention of KiUserApcDispatcher providing a SEH frame, and that dispatcher being involved in kernel->user dispatch).  Also another reference mentions that vectored exception handling overwrites structured, so that might also be related.

It's probably worth trying to wrap our main() in a SEH handler and see what happens.
I should say -- I don't know that this needs to block beta4.  It's been happening for a while now, maybe even on previous versions of firefox, and is only becoming an issue because more and more people are running 64-bit versions of windows.  But it should block beta5.
Severity: normal → critical
Agreed with comment 3
blocking2.0: beta4+ → beta5+
Duplicate of this bug: 586909
(In reply to comment #3)
> I should say -- I don't know that this needs to block beta4.  It's been
> happening for a while now, maybe even on previous versions of firefox, and is
> only becoming an issue because more and more people are running 64-bit versions
> of windows.  But it should block beta5.

If you guys are planning on turning on D2D by default then this needs to be fixed soon as it is leading to quite a common issue among testers see Bug 586909 for example.

This kind of issue makes it difficult to test D2D in the short term and since, as I understand it, the whole point of enabling D2D now was to try and get some user coverage this is going to hurt that.
I can't reproduce this with CrashMe on Win64
I wonder if this only happens on crashes that happen during paint
Comment 1 implies that it only happens when we're being called back by Windows, so inside a WM_PAINT seems likely. (CrashMe is just triggered off a menu item, which is serviced via the event loop, but probably not via a Windows message).
This behavior is documented in MSDN:

http://msdn.microsoft.com/en-us/library/ms633573%28VS.85%29.aspx

It appears to me that we need to hook up our crash reporter with our window proc(s), and just invoke it and terminate the process in case of a fatal exception.
I think this should probably block enabling D2D by default, otherwise we stand to miss out on bunching of crash feedback and giving people a bad experience at the same time.
(In reply to comment #11)
> We use SEH around a11y calls currently:
> http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#181
> http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584

This seems exactly like what we need to be doing here:

http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584

Except that we should not return EXCEPTION_CONTINUE_SEARCH I think.  I think the right thing to do here is to call TerminateProcess instead (provided that the crash reporter can handle that!)
Assignee: nobody → benjamin
That's pretty ridiculous.

But, we need to test -- if we add an exception handler and then TerminateProcess, we'd still like to know what the actual crash was.  Can we somehow extend breakpad to pass it to the exception structure?  Otherwise, this is just going to be a black hole "something crashed in the WndProc" type of report, since we won't have a stack or anything.
Attachment #466370 - Flags: review?(ted.mielczarek)
Attachment #466370 - Flags: review?(jmuizelaar)
vlad: the examples I linked to (and bsmedberg's patch) call nsICrashReporter::WriteMinidumpForException, which takes an EXCEPTION_POINTERS* argument, which we get from the SEH catch block, so crash reports will show up with the crashing thread stack shown from wherever the original crash was.
Comment on attachment 466370 [details] [diff] [review]
Report exceptions from nsWindow::WindowProc, rev. 1

Looks sensible. If we could figure out a reliable way to test this (even as a Litmus test) that would be awesome.
Attachment #466370 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 466370 [details] [diff] [review]
Report exceptions from nsWindow::WindowProc, rev. 1

This doesn't seem quite like the right approach to me.  Why are we converting all of the possible exception codes into crashes?
Isn't everything that __except catches a crash if you don't catch it? Why wouldn't we catch all of them?
This seems to work for me. At least it quickly exited firefox instead of partially starting up. I'm building with crash reporter enabled right now to make sure that works.
(In reply to comment #19)
> Isn't everything that __except catches a crash if you don't catch it? Why
> wouldn't we catch all of them?

Yes.  Well, I guess we could go with this patch as well, since the MSDN docs do not seem to suggest that any kind of filtering happens by Windows.  I would assume that our crash reporting infrastructure is capable of handling different exception codes...
Indeed, we do.
(In reply to comment #19)
> Isn't everything that __except catches a crash if you don't catch it? Why
> wouldn't we catch all of them?

Right, if the SEH frames below us don't catch it I'd say we need to terminate. I'm guessing noone's planning on catching exceptions -through- the Win32 message layer. I think this fix is exactly what we need and it's roughly what I suggested to release-drivers. So I'm all for this.
(In reply to comment #23)
> (In reply to comment #19)
> > Isn't everything that __except catches a crash if you don't catch it? Why
> > wouldn't we catch all of them?
> 
> Right, if the SEH frames below us don't catch it I'd say we need to terminate.
> I'm guessing noone's planning on catching exceptions -through- the Win32
> message layer. I think this fix is exactly what we need and it's roughly what I
> suggested to release-drivers. So I'm all for this.

FWIW, I don't agree with this in the general sense, it's always possible to have a global handler to handle some benign exceptions (or even ignore them), and it is possible to pass custom exception codes to RaiseException.  But in *this* case, none of this matters since Windows will be eating the exceptions up if we don't do it!
http://hg.mozilla.org/mozilla-central/rev/ed9357ce0939
http://hg.mozilla.org/mozilla-central/rev/a2c829756d99
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Attachment #466370 - Flags: review?(jmuizelaar) → approval1.9.2.10?
(In reply to comment #14)
> That's pretty ridiculous.
Agreed. That pretty much breaks just-in-time debugging too.
Does the debugger catch these on Windows 64, or are they eaten? I'm happy to #ifdef this code so it doesn't run in debug builds, or checks whether a debugger is attached, if that's possible.
It never caught it for me.  You can probably do it if you manually configure breaking on caught exceptions or something (since as far as the debugger knows, the exception was handled, it never propagated back up).
(In reply to comment #27)
> Does the debugger catch these on Windows 64, or are they eaten? I'm happy to
> #ifdef this code so it doesn't run in debug builds, or checks whether a
> debugger is attached, if that's possible.

You can call IsDebuggerPresent(), and if it returns TRUE, call DebugBreak() maybe?
This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an assertion, expecting it to invoke the JIT debugger :-(
Is it supposed to catch the crashes from https://bugzilla.mozilla.org/show_bug.cgi?id=586909 now ? i just crashed the usual way with mozalloc.dll with this build

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100817 Minefield/4.0b4pre Firefox/3.6.7

and breakpad didn't fire, is there some auto crashme test just to check that breakpad is functional on my box
(In reply to comment #31)
> This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an
> assertion, expecting it to invoke the JIT debugger :-(

Benjamin, can we only look at access violations here?
No, we are not going to look at only access violations, that doesn't make sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler.
(In reply to comment #34)
> No, we are not going to look at only access violations, that doesn't make
> sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler.

We could also let the exception pass up through WindowProc on platforms that support it like 32bit on 32bit.
Depends on: 588383
Attachment #467013 - Flags: review?(ted.mielczarek)
Attachment #467013 - Flags: review?(neil)
Attachment #467013 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 467013 [details] [diff] [review]
IsDebuggerPresent check, rev. 1

This doesn't seem to have any effect at all.
When the debugger isn't attached, the process still terminates. (This is true even if I remove the IsDebuggerPresent check!)
When the debugger is attached, it breaks at the first-chance exception anyway.
Attachment #467013 - Flags: review?(neil) → review-
(In reply to comment #37)
> Comment on attachment 467013 [details] [diff] [review]
> IsDebuggerPresent check, rev. 1
> 
> This doesn't seem to have any effect at all.
> When the debugger isn't attached, the process still terminates. (This is true
> even if I remove the IsDebuggerPresent check!)
> When the debugger is attached, it breaks at the first-chance exception anyway.

Often debuggers are not set to break at first-chance exceptions. MSVC's default scheme for example only breaks at unhandled exceptions.
Attachment #466370 - Attachment is obsolete: true
Attachment #467119 - Flags: review?(neil)
Attachment #466370 - Flags: approval1.9.2.10?
Attachment #466370 - Attachment is obsolete: false
Attachment #467013 - Attachment is obsolete: true
Comment on attachment 467119 [details] [diff] [review]
Use FatalExit to enter the debugger, rev. 1

So, given comment #39, DebugBreak is less worse than FatalExit.
Attachment #467119 - Flags: review?(neil) → review-
Attachment #467013 - Attachment is obsolete: false
Attachment #467013 - Flags: review- → review+
What about using UnhandledExceptionFilter?
If the debugger is attached, it will return EXCEPTION_CONTINUE_SEARCH immediately.
If the debugger is not attached, it will call user-defined filter which will invoke our crash reporter.
If the debugger is not attached and the user-defined filter doesn't handle the exception, it will invoke the JIT debugger before returning EXCEPTION_CONTINUE_SEARCH.
So we can DebugBreak if the function returns EXCEPTION_CONTINUE_SEARCH.
Here's a pseudocode of UnHandledExceptionFilter.
https://www.microsoft.com/msj/0197/exception/exceptiontextfigs.htm#fig13
This really seems as though it hasn't been totally fixed, as people are still seeing a lot of uncrashing crashes on win64.

Also, interestingly: http://support.microsoft.com/kb/976038
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 589495
Gavin asked that I not reopen this despite it remaining unfixed. I have filed bug 589495 on finishing the fixing of this bug.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.