Closed Bug 514589 Opened 15 years ago Closed 15 years ago

Pure virtual function call crashes show up as [@ KiFastSystemCallRet]

Categories

(Socorro :: General, task, P2)

x86
Windows Vista

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: lars)

References

Details

(Whiteboard: [1.9.2-fixed])

Most or all [@ KiFastSystemCallRet] crashes are actually pure virtual function calls in various parts of the codebase. They also have "No crash" as the crash reason, which is incorrect; the reason should be "Pure virtual function call". Treated as a single crash, it's topcrash #21 for 3.5.2. But it should be treated as separate crashes, e.g. [@ nsHTMLFormElement::RemoveElement] for the following example. http://crash-stats.mozilla.com/report/index/e0e5731a-e16d-46ab-9357-186c52090830 KiFastSystemCallRet NtWaitForSingleObject WaitForSingleObjectEx WaitForSingleObject google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread google_breakpad::ExceptionHandler::HandlePureVirtualCall _purecall nsHTMLFormElement::RemoveElement nsGenericHTMLFormElement::ClearForm arena_dalloc nsHTMLInputElement::`scalar deleting destructor' nsNodeUtils::LastRelease nsGenericElement::Release nsCOMPtr_base::~nsCOMPtr_base ...
note that i'd be good to distinguish [@ _purecall - nsHTMLFormElement::RemoveElement] from [@ nsHTMLFormElement::RemoveElement] Of the two crashes w/ [@ KiFastSystemCallRet], one of them also had a similar signature but was a simple null pointer deref. there's a huge difference between a null pointer deref and a _purecall ....
This is also related to bug 397199. We do have an assertion object in the minidump that's not currently exposed to Socorro. The assertion object contains at least an enum telling what type of problem we hit: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/google_breakpad/common/minidump_format.h#703 With this info, Socorro could know to handle these better. We might be able to implement a workaround though, something like "if frame 6 is _purecall, the signature should be frame 6 + frame 7". (It should always be frame 6, if we hit this type of crash.)
I don't think this is a Breakpad bug. Breakpad is reporting this info to the best of its ability, it's just that Socorro currently doesn't have access to the full info.
Component: Breakpad Integration → Socorro
Product: Toolkit → Webtools
QA Contact: breakpad.integration → socorro
Also, FWIW, is there a bug filed on the actual crash whose stack is in comment 0? I just had someone give me a minidump showing a very similar stack: 0012ee70 100382c2 mozcrt19!_purecall(void)+0x14 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\memory\jemalloc\crtsrc\purevirt.c @ 54] 0012ee94 1003ff00 xul!nsHTMLFormElement::RemoveElement(class nsIFormControl * aChild = 0x02e5041c, int aNotify = 0)+0x10 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nshtmlformelement.cpp @ 1463] 0012efe0 1005996c xul!nsGenericHTMLFormElement::ClearForm(int aRemoveFromForm = 1, int aNotify = 0)+0xd5 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nsgenerichtmlelement.cpp @ 2327] 0012effc 1001ce56 xul!nsHTMLInputElement::~nsHTMLInputElement(void)+0xbe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nshtmlinputelement.cpp @ 477] 0012f004 10113822 xul!nsHTMLInputElement::`scalar deleting destructor'(void)+0x8 0012f024 101262bd xul!nsNodeUtils::LastRelease(class nsINode * aNode = 0x10713b73)+0xa2 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\base\src\nsnodeutils.cpp @ 260] 0012f034 10124e3e xul!nsGenericElement::Release(void)+0x1d [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\base\src\nsgenericelement.cpp @ 4218] 0012f03c 1048ea3d xul!nsCOMPtr_base::~nsCOMPtr_base(void)+0xe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\xpcom\build\nscomptr.cpp @ 82] 0012f044 1049335b xul!nsHtml5TreeOperation::~nsHtml5TreeOperation(void)+0x1d [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\parser\html\nshtml5treeoperation.cpp @ 60] 0012f04c 104933a5 xul!nsHtml5TreeOperation::`scalar deleting destructor'(void)+0x9 0012f05c 10713931 xul!nsTArray<nsHtml5TreeOperation>::DestructRange(unsigned int start = 0x12f0c0, unsigned int count = 0x15b8080)+0x1c [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 862] 0012f068 1071397b xul!nsTArray<nsHtml5TreeOperation>::RemoveElementsAt(unsigned int start = 0x12f0c0, unsigned int count = 0x15b8080)+0xd [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 664] 0012f074 10713a42 xul!nsTArray<nsHtml5TreeOperation>::Clear(void)+0xe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 675] 0012f080 10713b12 xul!nsHtml5TreeBuilder::end(void)+0x17 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\parser\html\nshtml5treebuildercppsupplement.h @ 280]
Yes, I filed a bug for that and I think someone else also filed a bug on that.
Yes, bug 514425 (came out of bug 513975).
I just got a crash @ http://lite.facebook.com/geeknik and the crash report pointed me to here. http://crash-stats.mozilla.com/report/index/808b0019-fc10-4b54-86a9-e11bc2090913 I've actually been seeing a lot of KiFastSystemCallRet crashes lately.
That's probably bug 514425, based on the frame below _purecall in your stack.
This got assigned to me during the crash-kill meeting, though likely I'll end up hounding someone else for the fix ;)
Assignee: nobody → jonas
blocking-1.9.2+ per CrashKill effort.
Flags: blocking1.9.2+
ted, It is unclear from your Comment #2 and Comment #3 if there is something I can do in Socorro to help this issue. You say that there is data that is not exposed to Socorro, but then you say that the problem is not really a Breakpad issue. So is there anything I can make Socorro do that it's not doing now?
There is info in the minidump that minidump_stackwalk doesn't report that says "this is a crash from a pure virtual function call". Exposing that would require hacking minidump_stackwalk or something. That being said, I think we can sort of cheat, and just drop those top 5 frames if they appear in that order, and then make _purecall an appending signature, so you'd get [@ _purecall | whatever ].
Given a stack like this: Frame 0: KiFastSystemCallRet Frame 1: NtWaitForSingleObject Frame 2: WaitForSingleObjectEx Frame 3: WaitForSingleObject Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall Frame 6: _purecall Frame 7: nsHTMLFormElement::RemoveElement Frame 8: nsGenericHTMLFormElement::ClearForm Can I generalize the rule to say, "ignore all frame signatures with a lower number than a frame with a signature of "_purecall"? Doing an ad hoc tour through the database looking for "_purecall", I can see that it always appears in frames 4,5,6, or 7. The top of the stack (and therefore the overall crash signature) is one of: KiFastSystemCallRet (most common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare), ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling.
(In reply to comment #13) > Can I generalize the rule to say, "ignore all frame signatures with a lower > number than a frame with a signature of "_purecall"? Sounds good to me.
(In reply to comment #13) > Given a stack like this: > > Frame 0: KiFastSystemCallRet > Frame 1: NtWaitForSingleObject > Frame 2: WaitForSingleObjectEx > Frame 3: WaitForSingleObject > Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread > Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall > Frame 6: _purecall > Frame 7: nsHTMLFormElement::RemoveElement > Frame 8: nsGenericHTMLFormElement::ClearForm > > Can I generalize the rule to say, "ignore all frame signatures with a lower > number than a frame with a signature of "_purecall"? Yes. Calling _purecall is basically a crash, so anything called after that doesn't really matter. The *only* time I can think of when that isn't true is if we have broken stack (due to imperfect stack-walking in the face of optimizations, or due to trashed memory), and it so happens that we wrongly attribute some frame to "_purecall". However this seems very unlikely to happen. I've never seen that happen myself. If we start seeing it we can deal with it then. I strongly doubt we ever will. > Doing an ad hoc tour through the database looking for "_purecall", I can see > that it always appears in frames 4,5,6, or 7. The top of the stack (and > therefore the overall crash signature) is one of: KiFastSystemCallRet (most > common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare), > ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling. Seems like a good reason to always ignore anything below "_purecall" for now. Sorry I don't really know who is in charge of what here. Lars, should you own this bug? Or Ted? Or someone else?
Lars is in charge of making changes to the way the processor generates signatures. (In reply to comment #13) > Given a stack like this: > > Frame 0: KiFastSystemCallRet > Frame 1: NtWaitForSingleObject > Frame 2: WaitForSingleObjectEx > Frame 3: WaitForSingleObject > Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread > Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall > Frame 6: _purecall > Frame 7: nsHTMLFormElement::RemoveElement > Frame 8: nsGenericHTMLFormElement::ClearForm > > Can I generalize the rule to say, "ignore all frame signatures with a lower > number than a frame with a signature of "_purecall"? Yes. As I said, I think this should wind up with a signature of [@ _purecall | nsHTMLFormElement::RemoveElement ]. > Doing an ad hoc tour through the database looking for "_purecall", I can see > that it always appears in frames 4,5,6, or 7. The top of the stack (and > therefore the overall crash signature) is one of: KiFastSystemCallRet (most > common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare), > ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling. Not surprising that there's some variety, I guess. This thread is still live when it signals the other thread to write the dump, so there are probably slight differences due to timing, but the same general call stack. (The ntdll@0x... ones are just due to not having Win32 symbols at the time we process those crashes.)
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Priority: -- → P2
Lars: Any updates here? Is there an ETA?
Apparently I forgot to reassign to Lars as per comment 16.
Assignee: jonas → lars
the enhancement to the Socorro code was made late last week, we're just waiting for IT to respond to the staging request. Once they've done that and it passes muster (should be mere minutes later), then I have to submit an IT request for the push to production.
This enhancement to the Socorro processors is now been submitted to IT for a push to production. Follow Bug 520789 for progress. Hit me up to be added to the CC list if you cannot see that bug. For anyone interested, here's how the enhancement was made: there is a new configuration parameter called 'signatureSentinels' for the processors. When examining the stack of the crashing thread, a processor first scans for any frame signature in the list of signature sentinels. On finding one, it discards any frames with a lower frame number and then passes the truncated stack on for processing by further signature generation rules. In the case of '_purecall', that literal string appears in both the 'signatureSentinels' and 'prefixSignaturesRE' lists, so the stack is truncated to '_purecall' as the top and then '_purecall' is combined with the next frame signature to make a compound signature. Examples: Frame 0: KiFastSystemCallRet Frame 1: NtWaitForSingleObject Frame 2: WaitForSingleObjectEx Frame 3: WaitForSingleObject Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall Frame 6: _purecall Frame 7: nsHTMLFormElement::RemoveElement Frame 8: nsGenericHTMLFormElement::ClearForm becomes 'purecall | nsHTMLFormElement::RemoveElement' Frame 0: KiFastSystemCallRet Frame 1: ZwWaitForSingleObject Frame 2: WaitForSingleObjectEx Frame 3: WaitForSingleObject Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall Frame 6: _purecall Frame 7: nsCOMPtr<nsIDOMHTMLAreaElement>::~nsCOMPtr<nsIDOMHTMLAreaElement> Frame 8: nsEventDispatcher::Dispatch Frame 9: FireFocusOrBlurEvent Frame 10: nsEventStateManager::PreHandleEvent Frame 11: PresShell::HandleEventInternal Frame 12: PresShell::HandleEvent becomes '_purecall | nsCOMPtr<nsIDOMHTMLAreaElement>::~nsCOMPtr<nsIDOMHTMLAreaElement>()'
Awesome! Thanks Lars, that's perfect
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Does this fix apply only to crash reports that arrive after the bug was fixed, or does it also apply retroactively to older crash reports?
Only new reports.
note to self and other that are interested in helping: we should verify this by seeing the decline in this signature and re-distribution of these crashes across other signatures. jesse says the higest is around #81 top crash...
Yep. Only 4 crashes today showed up as KiFastSystemCallRet. There are now many "_purecall | *" crashes showing up, including one signature that makes the overall topcrash list.
Status: RESOLVED → VERIFIED
I haven't had one of these crashes since around 13 September.
Whiteboard: [1.9.2-fixed]
Blocks: 575372
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.