Closed
Bug 514589
Opened 15 years ago
Closed 15 years ago
Pure virtual function call crashes show up as [@ KiFastSystemCallRet]
Categories
(Socorro :: General, task, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: jruderman, Assigned: lars)
References
Details
(Whiteboard: [1.9.2-fixed])
Most or all [@ KiFastSystemCallRet] crashes are actually pure virtual function calls in various parts of the codebase. They also have "No crash" as the crash reason, which is incorrect; the reason should be "Pure virtual function call".
Treated as a single crash, it's topcrash #21 for 3.5.2. But it should be treated as separate crashes, e.g. [@ nsHTMLFormElement::RemoveElement] for the following example.
http://crash-stats.mozilla.com/report/index/e0e5731a-e16d-46ab-9357-186c52090830
KiFastSystemCallRet
NtWaitForSingleObject
WaitForSingleObjectEx
WaitForSingleObject
google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
google_breakpad::ExceptionHandler::HandlePureVirtualCall
_purecall
nsHTMLFormElement::RemoveElement
nsGenericHTMLFormElement::ClearForm
arena_dalloc
nsHTMLInputElement::`scalar deleting destructor'
nsNodeUtils::LastRelease
nsGenericElement::Release
nsCOMPtr_base::~nsCOMPtr_base
...
note that i'd be good to distinguish [@ _purecall - nsHTMLFormElement::RemoveElement] from [@ nsHTMLFormElement::RemoveElement]
Of the two crashes w/ [@ KiFastSystemCallRet], one of them also had a similar signature but was a simple null pointer deref.
there's a huge difference between a null pointer deref and a _purecall ....
Comment 2•15 years ago
|
||
This is also related to bug 397199. We do have an assertion object in the minidump that's not currently exposed to Socorro. The assertion object contains at least an enum telling what type of problem we hit:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/google_breakpad/common/minidump_format.h#703
With this info, Socorro could know to handle these better. We might be able to implement a workaround though, something like "if frame 6 is _purecall, the signature should be frame 6 + frame 7". (It should always be frame 6, if we hit this type of crash.)
Comment 3•15 years ago
|
||
I don't think this is a Breakpad bug. Breakpad is reporting this info to the best of its ability, it's just that Socorro currently doesn't have access to the full info.
Component: Breakpad Integration → Socorro
Product: Toolkit → Webtools
QA Contact: breakpad.integration → socorro
Comment 4•15 years ago
|
||
Also, FWIW, is there a bug filed on the actual crash whose stack is in comment 0? I just had someone give me a minidump showing a very similar stack:
0012ee70 100382c2 mozcrt19!_purecall(void)+0x14 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\memory\jemalloc\crtsrc\purevirt.c @ 54]
0012ee94 1003ff00 xul!nsHTMLFormElement::RemoveElement(class nsIFormControl * aChild = 0x02e5041c, int aNotify = 0)+0x10 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nshtmlformelement.cpp @ 1463]
0012efe0 1005996c xul!nsGenericHTMLFormElement::ClearForm(int aRemoveFromForm = 1, int aNotify = 0)+0xd5 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nsgenerichtmlelement.cpp @ 2327]
0012effc 1001ce56 xul!nsHTMLInputElement::~nsHTMLInputElement(void)+0xbe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\html\content\src\nshtmlinputelement.cpp @ 477]
0012f004 10113822 xul!nsHTMLInputElement::`scalar deleting destructor'(void)+0x8
0012f024 101262bd xul!nsNodeUtils::LastRelease(class nsINode * aNode = 0x10713b73)+0xa2 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\base\src\nsnodeutils.cpp @ 260]
0012f034 10124e3e xul!nsGenericElement::Release(void)+0x1d [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\content\base\src\nsgenericelement.cpp @ 4218]
0012f03c 1048ea3d xul!nsCOMPtr_base::~nsCOMPtr_base(void)+0xe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\xpcom\build\nscomptr.cpp @ 82]
0012f044 1049335b xul!nsHtml5TreeOperation::~nsHtml5TreeOperation(void)+0x1d [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\parser\html\nshtml5treeoperation.cpp @ 60]
0012f04c 104933a5 xul!nsHtml5TreeOperation::`scalar deleting destructor'(void)+0x9
0012f05c 10713931 xul!nsTArray<nsHtml5TreeOperation>::DestructRange(unsigned int start = 0x12f0c0, unsigned int count = 0x15b8080)+0x1c [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 862]
0012f068 1071397b xul!nsTArray<nsHtml5TreeOperation>::RemoveElementsAt(unsigned int start = 0x12f0c0, unsigned int count = 0x15b8080)+0xd [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 664]
0012f074 10713a42 xul!nsTArray<nsHtml5TreeOperation>::Clear(void)+0xe [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\obj-firefox\dist\include\nstarray.h @ 675]
0012f080 10713b12 xul!nsHtml5TreeBuilder::end(void)+0x17 [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\parser\html\nshtml5treebuildercppsupplement.h @ 280]
Comment 5•15 years ago
|
||
Yes, I filed a bug for that and I think someone else also filed a bug on that.
Reporter | ||
Comment 6•15 years ago
|
||
Yes, bug 514425 (came out of bug 513975).
Comment 7•15 years ago
|
||
I just got a crash @ http://lite.facebook.com/geeknik and the crash report pointed me to here.
http://crash-stats.mozilla.com/report/index/808b0019-fc10-4b54-86a9-e11bc2090913
I've actually been seeing a lot of KiFastSystemCallRet crashes lately.
Reporter | ||
Comment 8•15 years ago
|
||
That's probably bug 514425, based on the frame below _purecall in your stack.
This got assigned to me during the crash-kill meeting, though likely I'll end up hounding someone else for the fix ;)
Assignee: nobody → jonas
Assignee | ||
Comment 11•15 years ago
|
||
ted, It is unclear from your Comment #2 and Comment #3 if there is something I can do in Socorro to help this issue. You say that there is data that is not exposed to Socorro, but then you say that the problem is not really a Breakpad issue. So is there anything I can make Socorro do that it's not doing now?
Comment 12•15 years ago
|
||
There is info in the minidump that minidump_stackwalk doesn't report that says "this is a crash from a pure virtual function call". Exposing that would require hacking minidump_stackwalk or something.
That being said, I think we can sort of cheat, and just drop those top 5 frames if they appear in that order, and then make _purecall an appending signature, so you'd get [@ _purecall | whatever ].
Assignee | ||
Comment 13•15 years ago
|
||
Given a stack like this:
Frame 0: KiFastSystemCallRet
Frame 1: NtWaitForSingleObject
Frame 2: WaitForSingleObjectEx
Frame 3: WaitForSingleObject
Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall
Frame 6: _purecall
Frame 7: nsHTMLFormElement::RemoveElement
Frame 8: nsGenericHTMLFormElement::ClearForm
Can I generalize the rule to say, "ignore all frame signatures with a lower number than a frame with a signature of "_purecall"?
Doing an ad hoc tour through the database looking for "_purecall", I can see that it always appears in frames 4,5,6, or 7. The top of the stack (and therefore the overall crash signature) is one of: KiFastSystemCallRet (most common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare), ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling.
(In reply to comment #13)
> Can I generalize the rule to say, "ignore all frame signatures with a lower
> number than a frame with a signature of "_purecall"?
Sounds good to me.
(In reply to comment #13)
> Given a stack like this:
>
> Frame 0: KiFastSystemCallRet
> Frame 1: NtWaitForSingleObject
> Frame 2: WaitForSingleObjectEx
> Frame 3: WaitForSingleObject
> Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
> Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall
> Frame 6: _purecall
> Frame 7: nsHTMLFormElement::RemoveElement
> Frame 8: nsGenericHTMLFormElement::ClearForm
>
> Can I generalize the rule to say, "ignore all frame signatures with a lower
> number than a frame with a signature of "_purecall"?
Yes. Calling _purecall is basically a crash, so anything called after that doesn't really matter.
The *only* time I can think of when that isn't true is if we have broken stack (due to imperfect stack-walking in the face of optimizations, or due to trashed memory), and it so happens that we wrongly attribute some frame to "_purecall".
However this seems very unlikely to happen. I've never seen that happen myself. If we start seeing it we can deal with it then. I strongly doubt we ever will.
> Doing an ad hoc tour through the database looking for "_purecall", I can see
> that it always appears in frames 4,5,6, or 7. The top of the stack (and
> therefore the overall crash signature) is one of: KiFastSystemCallRet (most
> common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare),
> ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling.
Seems like a good reason to always ignore anything below "_purecall" for now.
Sorry I don't really know who is in charge of what here. Lars, should you own this bug? Or Ted? Or someone else?
Comment 16•15 years ago
|
||
Lars is in charge of making changes to the way the processor generates signatures.
(In reply to comment #13)
> Given a stack like this:
>
> Frame 0: KiFastSystemCallRet
> Frame 1: NtWaitForSingleObject
> Frame 2: WaitForSingleObjectEx
> Frame 3: WaitForSingleObject
> Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
> Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall
> Frame 6: _purecall
> Frame 7: nsHTMLFormElement::RemoveElement
> Frame 8: nsGenericHTMLFormElement::ClearForm
>
> Can I generalize the rule to say, "ignore all frame signatures with a lower
> number than a frame with a signature of "_purecall"?
Yes. As I said, I think this should wind up with a signature of [@ _purecall | nsHTMLFormElement::RemoveElement ].
> Doing an ad hoc tour through the database looking for "_purecall", I can see
> that it always appears in frames 4,5,6, or 7. The top of the stack (and
> therefore the overall crash signature) is one of: KiFastSystemCallRet (most
> common), NtWaitForSingleObject (occasional), ntdll.dll@0x1f871 (rare),
> ntdll.dll@0x65f34 (rare). There may be others, I did only a small sampling.
Not surprising that there's some variety, I guess. This thread is still live when it signals the other thread to write the dump, so there are probably slight differences due to timing, but the same general call stack. (The ntdll@0x... ones are just due to not having Win32 symbols at the time we process those crashes.)
Comment 17•15 years ago
|
||
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Updated•15 years ago
|
Priority: -- → P2
Lars: Any updates here? Is there an ETA?
Apparently I forgot to reassign to Lars as per comment 16.
Assignee: jonas → lars
Assignee | ||
Comment 20•15 years ago
|
||
the enhancement to the Socorro code was made late last week, we're just waiting for IT to respond to the staging request. Once they've done that and it passes muster (should be mere minutes later), then I have to submit an IT request for the push to production.
Assignee | ||
Comment 21•15 years ago
|
||
This enhancement to the Socorro processors is now been submitted to IT for a push to production. Follow Bug 520789 for progress. Hit me up to be added to the CC list if you cannot see that bug.
For anyone interested, here's how the enhancement was made: there is a new configuration parameter called 'signatureSentinels' for the processors. When examining the stack of the crashing thread, a processor first scans for any frame signature in the list of signature sentinels. On finding one, it discards any frames with a lower frame number and then passes the truncated stack on for processing by further signature generation rules. In the case of '_purecall', that literal string appears in both the 'signatureSentinels' and 'prefixSignaturesRE' lists, so the stack is truncated to '_purecall' as the top and then '_purecall' is combined with the next frame signature to make a compound signature.
Examples:
Frame 0: KiFastSystemCallRet
Frame 1: NtWaitForSingleObject
Frame 2: WaitForSingleObjectEx
Frame 3: WaitForSingleObject
Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall
Frame 6: _purecall
Frame 7: nsHTMLFormElement::RemoveElement
Frame 8: nsGenericHTMLFormElement::ClearForm
becomes 'purecall | nsHTMLFormElement::RemoveElement'
Frame 0: KiFastSystemCallRet
Frame 1: ZwWaitForSingleObject
Frame 2: WaitForSingleObjectEx
Frame 3: WaitForSingleObject
Frame 4: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread
Frame 5: google_breakpad::ExceptionHandler::HandlePureVirtualCall
Frame 6: _purecall
Frame 7: nsCOMPtr<nsIDOMHTMLAreaElement>::~nsCOMPtr<nsIDOMHTMLAreaElement>
Frame 8: nsEventDispatcher::Dispatch
Frame 9: FireFocusOrBlurEvent
Frame 10: nsEventStateManager::PreHandleEvent
Frame 11: PresShell::HandleEventInternal
Frame 12: PresShell::HandleEvent
becomes '_purecall | nsCOMPtr<nsIDOMHTMLAreaElement>::~nsCOMPtr<nsIDOMHTMLAreaElement>()'
Awesome! Thanks Lars, that's perfect
Assignee | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Does this fix apply only to crash reports that arrive after the bug was fixed, or does it also apply retroactively to older crash reports?
Comment 24•15 years ago
|
||
Only new reports.
Comment 25•15 years ago
|
||
note to self and other that are interested in helping:
we should verify this by seeing the decline in this signature and re-distribution of these crashes across other signatures. jesse says the higest is around #81 top crash...
Reporter | ||
Comment 26•15 years ago
|
||
Yep. Only 4 crashes today showed up as KiFastSystemCallRet. There are now many "_purecall | *" crashes showing up, including one signature that makes the overall topcrash list.
Status: RESOLVED → VERIFIED
Comment 27•15 years ago
|
||
I haven't had one of these crashes since around 13 September.
Updated•15 years ago
|
Whiteboard: [1.9.2-fixed]
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•