Closed
Bug 587406
Opened 14 years ago
Closed 14 years ago
Exceptions on windows 64 are not fatal
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta5+ |
People
(Reporter: joe, Assigned: benjamin)
References
Details
Attachments
(4 files)
2.60 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
805 bytes,
patch
|
ted
:
review+
neil
:
review+
|
Details | Diff | Splinter Review |
2.23 KB,
patch
|
Details | Diff | Splinter Review | |
1.00 KB,
patch
|
neil
:
review-
|
Details | Diff | Splinter Review |
Basically all exceptions on WOW64 (Windows 32 bit applications running on Windows 64) are non-fatal. This means that when we try to fire a _debugbreak() in our OOM handler, instead of breaking, WOW64 eats the exception and returns us to the event loop. This means that when we dereference a null pointer, Windows 64 returns us to the event loop. We are left in an utterly invalid state, unable to draw or progress any further, left spinning in the event loop in a futile attempt to get any work done.
This is bad.
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → beta4+
Reporter | ||
Comment 1•14 years ago
|
||
This, I believe, only happens when we call from our code back into Windows code, back into our code.
Mozilla(1) -> Windows -> Mozilla(2)
When Mozilla(2) throws an exception, for example by running out of memory and intentionally crashing, Windows catches it and silently unwinds back out, leaving us (most often) inside the event loop.
This has been bothering me for a long time, and it now makes more sense that WOW64 is involved. http://www.nynaeve.net/?p=202 might be useful (mention of KiUserApcDispatcher providing a SEH frame, and that dispatcher being involved in kernel->user dispatch). Also another reference mentions that vectored exception handling overwrites structured, so that might also be related.
It's probably worth trying to wrap our main() in a SEH handler and see what happens.
I should say -- I don't know that this needs to block beta4. It's been happening for a while now, maybe even on previous versions of firefox, and is only becoming an issue because more and more people are running 64-bit versions of windows. But it should block beta5.
OS: Mac OS X → Windows 7
Updated•14 years ago
|
Severity: normal → critical
Comment 6•14 years ago
|
||
(In reply to comment #3)
> I should say -- I don't know that this needs to block beta4. It's been
> happening for a while now, maybe even on previous versions of firefox, and is
> only becoming an issue because more and more people are running 64-bit versions
> of windows. But it should block beta5.
If you guys are planning on turning on D2D by default then this needs to be fixed soon as it is leading to quite a common issue among testers see Bug 586909 for example.
This kind of issue makes it difficult to test D2D in the short term and since, as I understand it, the whole point of enabling D2D now was to try and get some user coverage this is going to hurt that.
Comment 7•14 years ago
|
||
I can't reproduce this with CrashMe on Win64
Comment 8•14 years ago
|
||
I wonder if this only happens on crashes that happen during paint
Comment 9•14 years ago
|
||
Comment 1 implies that it only happens when we're being called back by Windows, so inside a WM_PAINT seems likely. (CrashMe is just triggered off a menu item, which is serviced via the event loop, but probably not via a Windows message).
Comment 10•14 years ago
|
||
This behavior is documented in MSDN:
http://msdn.microsoft.com/en-us/library/ms633573%28VS.85%29.aspx
It appears to me that we need to hook up our crash reporter with our window proc(s), and just invoke it and terminate the process in case of a fatal exception.
Comment 11•14 years ago
|
||
Comment 12•14 years ago
|
||
I think this should probably block enabling D2D by default, otherwise we stand to miss out on bunching of crash feedback and giving people a bad experience at the same time.
Comment 13•14 years ago
|
||
(In reply to comment #11)
> We use SEH around a11y calls currently:
> http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#181
> http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584
This seems exactly like what we need to be doing here:
http://mxr.mozilla.org/mozilla-central/source/accessible/src/msaa/nsAccessNodeWrap.cpp#584
Except that we should not return EXCEPTION_CONTINUE_SEARCH I think. I think the right thing to do here is to call TerminateProcess instead (provided that the crash reporter can handle that!)
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → benjamin
That's pretty ridiculous.
But, we need to test -- if we add an exception handler and then TerminateProcess, we'd still like to know what the actual crash was. Can we somehow extend breakpad to pass it to the exception structure? Otherwise, this is just going to be a black hole "something crashed in the WndProc" type of report, since we won't have a stack or anything.
Assignee | ||
Comment 15•14 years ago
|
||
Attachment #466370 -
Flags: review?(ted.mielczarek)
Attachment #466370 -
Flags: review?(jmuizelaar)
Comment 16•14 years ago
|
||
vlad: the examples I linked to (and bsmedberg's patch) call nsICrashReporter::WriteMinidumpForException, which takes an EXCEPTION_POINTERS* argument, which we get from the SEH catch block, so crash reports will show up with the crashing thread stack shown from wherever the original crash was.
Comment 17•14 years ago
|
||
Comment on attachment 466370 [details] [diff] [review]
Report exceptions from nsWindow::WindowProc, rev. 1
Looks sensible. If we could figure out a reliable way to test this (even as a Litmus test) that would be awesome.
Attachment #466370 -
Flags: review?(ted.mielczarek) → review+
Comment 18•14 years ago
|
||
Comment on attachment 466370 [details] [diff] [review]
Report exceptions from nsWindow::WindowProc, rev. 1
This doesn't seem quite like the right approach to me. Why are we converting all of the possible exception codes into crashes?
Assignee | ||
Comment 19•14 years ago
|
||
Isn't everything that __except catches a crash if you don't catch it? Why wouldn't we catch all of them?
Comment 20•14 years ago
|
||
This seems to work for me. At least it quickly exited firefox instead of partially starting up. I'm building with crash reporter enabled right now to make sure that works.
Comment 21•14 years ago
|
||
(In reply to comment #19)
> Isn't everything that __except catches a crash if you don't catch it? Why
> wouldn't we catch all of them?
Yes. Well, I guess we could go with this patch as well, since the MSDN docs do not seem to suggest that any kind of filtering happens by Windows. I would assume that our crash reporting infrastructure is capable of handling different exception codes...
Assignee | ||
Comment 22•14 years ago
|
||
Indeed, we do.
Comment 23•14 years ago
|
||
(In reply to comment #19)
> Isn't everything that __except catches a crash if you don't catch it? Why
> wouldn't we catch all of them?
Right, if the SEH frames below us don't catch it I'd say we need to terminate. I'm guessing noone's planning on catching exceptions -through- the Win32 message layer. I think this fix is exactly what we need and it's roughly what I suggested to release-drivers. So I'm all for this.
Comment 24•14 years ago
|
||
(In reply to comment #23)
> (In reply to comment #19)
> > Isn't everything that __except catches a crash if you don't catch it? Why
> > wouldn't we catch all of them?
>
> Right, if the SEH frames below us don't catch it I'd say we need to terminate.
> I'm guessing noone's planning on catching exceptions -through- the Win32
> message layer. I think this fix is exactly what we need and it's roughly what I
> suggested to release-drivers. So I'm all for this.
FWIW, I don't agree with this in the general sense, it's always possible to have a global handler to handle some benign exceptions (or even ignore them), and it is possible to pass custom exception codes to RaiseException. But in *this* case, none of this matters since Windows will be eating the exceptions up if we don't do it!
Assignee | ||
Comment 25•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/ed9357ce0939
http://hg.mozilla.org/mozilla-central/rev/a2c829756d99
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•14 years ago
|
Attachment #466370 -
Flags: review?(jmuizelaar) → approval1.9.2.10?
Comment 26•14 years ago
|
||
(In reply to comment #14)
> That's pretty ridiculous.
Agreed. That pretty much breaks just-in-time debugging too.
Assignee | ||
Comment 27•14 years ago
|
||
Does the debugger catch these on Windows 64, or are they eaten? I'm happy to #ifdef this code so it doesn't run in debug builds, or checks whether a debugger is attached, if that's possible.
It never caught it for me. You can probably do it if you manually configure breaking on caught exceptions or something (since as far as the debugger knows, the exception was handled, it never propagated back up).
Comment 29•14 years ago
|
||
(In reply to comment #27)
> Does the debugger catch these on Windows 64, or are they eaten? I'm happy to
> #ifdef this code so it doesn't run in debug builds, or checks whether a
> debugger is attached, if that's possible.
You can call IsDebuggerPresent(), and if it returns TRUE, call DebugBreak() maybe?
Comment 30•14 years ago
|
||
Comment 31•14 years ago
|
||
This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an assertion, expecting it to invoke the JIT debugger :-(
Comment 32•14 years ago
|
||
Is it supposed to catch the crashes from https://bugzilla.mozilla.org/show_bug.cgi?id=586909 now ? i just crashed the usual way with mozalloc.dll with this build
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100817 Minefield/4.0b4pre Firefox/3.6.7
and breakpad didn't fire, is there some auto crashme test just to check that breakpad is functional on my box
Comment 33•14 years ago
|
||
(In reply to comment #31)
> This breaks debugging on 32-bit Windows too - I just clicked "Retry" on an
> assertion, expecting it to invoke the JIT debugger :-(
Benjamin, can we only look at access violations here?
Assignee | ||
Comment 34•14 years ago
|
||
No, we are not going to look at only access violations, that doesn't make sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler.
Comment 35•14 years ago
|
||
(In reply to comment #34)
> No, we are not going to look at only access violations, that doesn't make
> sense. We can add a IsDebuggerPresent/DebugBreak combo in the handler.
We could also let the exception pass up through WindowProc on platforms that support it like 32bit on 32bit.
Assignee | ||
Comment 36•14 years ago
|
||
Attachment #467013 -
Flags: review?(ted.mielczarek)
Attachment #467013 -
Flags: review?(neil)
Updated•14 years ago
|
Attachment #467013 -
Flags: review?(ted.mielczarek) → review+
Comment 37•14 years ago
|
||
Comment on attachment 467013 [details] [diff] [review]
IsDebuggerPresent check, rev. 1
This doesn't seem to have any effect at all.
When the debugger isn't attached, the process still terminates. (This is true even if I remove the IsDebuggerPresent check!)
When the debugger is attached, it breaks at the first-chance exception anyway.
Attachment #467013 -
Flags: review?(neil) → review-
Comment 38•14 years ago
|
||
Comment 39•14 years ago
|
||
(In reply to comment #37)
> Comment on attachment 467013 [details] [diff] [review]
> IsDebuggerPresent check, rev. 1
>
> This doesn't seem to have any effect at all.
> When the debugger isn't attached, the process still terminates. (This is true
> even if I remove the IsDebuggerPresent check!)
> When the debugger is attached, it breaks at the first-chance exception anyway.
Often debuggers are not set to break at first-chance exceptions. MSVC's default scheme for example only breaks at unhandled exceptions.
Assignee | ||
Comment 40•14 years ago
|
||
Attachment #466370 -
Attachment is obsolete: true
Attachment #467119 -
Flags: review?(neil)
Attachment #466370 -
Flags: approval1.9.2.10?
Assignee | ||
Updated•14 years ago
|
Attachment #466370 -
Attachment is obsolete: false
Assignee | ||
Updated•14 years ago
|
Attachment #467013 -
Attachment is obsolete: true
Comment 41•14 years ago
|
||
Comment on attachment 467119 [details] [diff] [review]
Use FatalExit to enter the debugger, rev. 1
So, given comment #39, DebugBreak is less worse than FatalExit.
Attachment #467119 -
Flags: review?(neil) → review-
Updated•14 years ago
|
Attachment #467013 -
Attachment is obsolete: false
Attachment #467013 -
Flags: review- → review+
Comment 42•14 years ago
|
||
What about using UnhandledExceptionFilter?
If the debugger is attached, it will return EXCEPTION_CONTINUE_SEARCH immediately.
If the debugger is not attached, it will call user-defined filter which will invoke our crash reporter.
If the debugger is not attached and the user-defined filter doesn't handle the exception, it will invoke the JIT debugger before returning EXCEPTION_CONTINUE_SEARCH.
So we can DebugBreak if the function returns EXCEPTION_CONTINUE_SEARCH.
Here's a pseudocode of UnHandledExceptionFilter.
https://www.microsoft.com/msj/0197/exception/exceptiontextfigs.htm#fig13
Reporter | ||
Comment 43•14 years ago
|
||
This really seems as though it hasn't been totally fixed, as people are still seeing a lot of uncrashing crashes on win64.
Also, interestingly: http://support.microsoft.com/kb/976038
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 44•14 years ago
|
||
Gavin asked that I not reopen this despite it remaining unfixed. I have filed bug 589495 on finishing the fixing of this bug.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•