Closed Bug 524256 Opened 15 years ago Closed 15 years ago

Crash stacks not unwound past XPCWrappedNative::CallMethod

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta2-fixed
status1.9.1 --- .8-fixed

People

(Reporter: jruderman, Assigned: neil)

References

Details

(Keywords: verified1.9.1, Whiteboard: [crashkill])

Attachments

(2 files)

Stacks for many signatures are getting cut off at XPCWrappedNative::CallMethod.

bp-8c98528d-a4d5-4ee5-a0cc-c26332091023  
Bug 505059 -- [@ CSSLoaderImpl::LoadSheet...]

bp-d04a9331-4adc-4240-8435-bc3c52091022  
Bug 524255 -- [@ _purecall | CallQueryInterface...]
  
CCing the heroes of bug 519616.
Seems more likely a Gecko bug, in that our assembly-coded NS_InvokeByIndex_P probably isn't providing all the debug information that compiler-generated code does.
(In reply to comment #1)
> Seems more likely a Gecko bug, in that our assembly-coded NS_InvokeByIndex_P
> probably isn't providing all the debug information that compiler-generated code
> does.

I know it's possible to emit CFI annotations on linux and presumably OS X (http://lkml.indiana.edu/hypermail/linux/kernel/0509.1/0118.html). Has anyone seen anything similar for MSVC?

Still, even without stack unwind information, we should be able to unwind the stack. Does anyone have a minidump of a crash that fails like these ones?
Assuming this is Windows-only: does NS_InvokeByIndex use a frame pointer? Breakpad (and most debuggers) assume that if debugging information is not present for a function, the function uses a frame pointer in EBP

http://mxr.mozilla.org/mozilla-central/source/xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp#79
Component: Socorro → XPCOM
Product: Webtools → Core
QA Contact: socorro → xpcom
(In reply to comment #3)
> Assuming this is Windows-only: does NS_InvokeByIndex use a frame pointer?
> Breakpad (and most debuggers) assume that if debugging information is not
> present for a function, the function uses a frame pointer in EBP

Yes, it appears to use a frame pointer.
I wonder if declspec(naked) emits some sort of debugging information that isn't correct for the function.
Jeff: you can easily produce one using "Crash Me Now!" and grabbing the minidump out of the Crash Reports/pending/ directory before quitting the crash reporter:

Stacks from that look like:
http://crash-stats.mozilla.com/report/index/bp-73cd938b-98ae-4148-9757-4433b2090929
Running the stackwalker on the minidump from this crash:
http://crash-stats.mozilla.com/report/index/bd254b84-4ed3-4a37-a763-268702091026
gives me a much more useful. Perhaps breakpad was not properly updated?
AFAICT the new code is running in production, evidenced by the lack of 0x0 crashes in the topcrash list.
When I run minidump_stackwalk (from SVN tip) on your dump, I just get:
Thread 0 (crashed)
 0  crashme.dll!nsCrasher::Crash() [nsCrasher.cpp : 21 + 0x0]
    eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
    esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
    edx = 0x00951c00   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  xul.dll!NS_InvokeByIndex_P [xptcinvoke.cpp:ef4565b6f268 : 101 + 0x2]
    eip = 0x50691636   esp = 0x0013eae4   ebp = 0x0013eae8
    Found by: call frame info
 2  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::Call
Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
    eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013eae8
    Found by: call frame info with scanning

in the crashing thread, same as crash-stats. You can see that it resorts to stack scanning to even get frame 2, then it must just get lost.
(In reply to comment #10)
> When I run minidump_stackwalk (from SVN tip) on your dump, I just get:
> Thread 0 (crashed)
>  0  crashme.dll!nsCrasher::Crash() [nsCrasher.cpp : 21 + 0x0]
>     eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
>     esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
>     edx = 0x00951c00   efl = 0x00010246
>     Found by: given as instruction pointer in context
>  1  xul.dll!NS_InvokeByIndex_P [xptcinvoke.cpp:ef4565b6f268 : 101 + 0x2]
>     eip = 0x50691636   esp = 0x0013eae4   ebp = 0x0013eae8
>     Found by: call frame info
>  2  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext
> &,XPCWrappedNative::Call
> Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
>     eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013eae8
>     Found by: call frame info with scanning
> 
> in the crashing thread, same as crash-stats. You can see that it resorts to
> stack scanning to even get frame 2, then it must just get lost.

The reason you run into the problem is because you have symbols for crashme.dll. I didn't have those so I was going from Crash() to CallMethod() and using the frame pointer from there.

Why does the crash server have symbols for crashme.dll and how do I get them? They don't seem to be listed in symbol index file.
Ah, interesting. I manually uploaded the crashme symbols after I built the extension. They're in the same directory as our OS symbols, which isn't available via HTTP. I copied it over to people if you want it:
http://people.mozilla.org/~tmielczarek/crashme.sym
Yeah, that is interesting. If I process it without the crashme symbols it gets the full stack. I guess the NS_InvokeByIndex_P frame is kind of a tar pit. Regardless, the crashes in comment 0 have the same exact problem exhibited by your minidump when we have symbols for the top of the stack, so we should figure out why this is. Here's the top 4 frames I get when I process without crashme symbols:
Thread 0 (crashed)
 0  crashme.dll + 0x11bd
    eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
    esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
    edx = 0x00951c00   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::Call
Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
    eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013ed78
    Found by: previous frame's frame pointer
 2  xul.dll!XPC_WN_CallMethod(JSContext *,JSObject *,unsigned int,int *,int *) [
xpcwrappednativejsops.cpp:ef4565b6f268 : 1756 + 0x11]
    eip = 0x5052ad40   esp = 0x0013ed80   ebp = 0x035010b4
    Found by: previous frame's frame pointer
 3  mozjs.dll!js_Invoke [jsinterp.cpp:ef4565b6f268 : 1365 + 0x14]
    eip = 0x623a0aca   esp = 0x0013ee4c   ebp = 0x0013eee8
    Found by: call frame info
 4  mozjs.dll!js_Interpret [jsops.cpp:ef4565b6f268 : 2291 + 0xd]
    eip = 0x623986e1   esp = 0x0013eef0   ebp = 0x0013f0e4
    Found by: previous frame's frame pointer

I poked through the debug info a little bit, and here's a brain dump. NS_InvokeByIndex_P looks like this:
FILE 338447 hg:hg.mozilla.org/mozilla-central:xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:ef4565b6f268
FUNC 27160f 2b 10 NS_InvokeByIndex_P
STACK WIN 4 27160f 2b 0 0 10 0 0 0 1 $T2 $esp .cbLocals + .cbSavedRegs + = $T0 .raSearchStart = $eip $T0 ^ = $esp $T0 4 + =

That program string says:
$T0 = .raSearchStart
$eip = *$T0
$esp = $T0 + 4
(Where .raSearchStart is http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_x86.cc#160, 0x13eae8 in this specific case)
The value of %eip comes out bogus here (0x13ed78), so we resort to stack scanning, which does find the next frame. However, it gives us %eip = 0x5052e7f1 (rva = 0x10e7f1), which is actually way past the end of XPC_WN_CallMethod (rva = 0x10aba0, size = 0x26a). Since the IP isn't actually contained in this function, we fail to find the FPO data for it here:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#427

Which means we have to resort to %ebp walking, but our value of %ebp is clearly not valid for this frame, so we just fail. I think this indicates that there's something failing in the debug info for NS_InvokeByIndex_P, since we should be able to find the next frame from there strictly from program string evaluation. I wouldn't be surprised to find that the __declspec(naked) is doing something weird.
Yeah, that program string is wrong for NS_InvokeByIndex: no reference to $ebp, it's assuming the function doesn't have a frame pointer because optimizations are on and it doesn't see the assembly. I have a patch which might work.
I don't think we should spend a lot of time on this, while it's clearly a bug, I don't think the resulting stacks are likely to help much with crash analysis. For example, the rest of the stack in this "crash me now" crash would just tell you that JavaScript was invoked from the event loop.

(But if bsmedberg's patch works, then great, let's just fix it.)
It could help in bug 505059 by telling us where we are in shutdown, or in bug 524255 by telling us if the _purecall is due to there being a destructor on the stack.  And it would dispel the feeling that the stacks are "busted" and therefore ignorable.
Also, XPCWrappedNative is called from a lot of places (we do sometimes do have stacks that go C++ -> JS -> C++) and having that extra context can be useful.
#pragmas FTW
This basically suppresses the custom STACK operations for NS_InvokeByIndex so that the default frame-pointer operation takes over.
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Attachment #408618 - Flags: review?(ted.mielczarek)
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Nice!
Attachment #408618 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

We should land this on branches too. It's very simple, doesn't change any behavior, and will help us get better crash stacks immediately.
Attachment #408618 - Flags: approval1.9.2+
Attachment #408618 - Flags: approval1.9.1.5?
Landed in m-c:
http://hg.mozilla.org/mozilla-central/rev/15f4223cd977
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [crashkill]
Can someone verify using today's nightly that this in fact is fixed, before I start on the branches?
Today's nightly shows:
FUNC 2763cf 2b 10 NS_InvokeByIndex_P
STACK WIN 4 2763cf 2b 0 0 10 0 0 0 1 $T2 $esp .cbLocals + .cbSavedRegs + = $T0 .
raSearchStart = $eip $T0 ^ = $esp $T0 4 + =

Perhaps WPO or PGO (or both) are screwing you up here? I notice the optimize pragma takes a "g" for "global optimizations". Might be worth a shot.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Also, I tried the obvious "process a minidump from crash me now produced with today's nightly" before looking at the symbol files.)
Other than "delete the NS_InvokeByIndex_P lines from the .sym file" I don't know what to do with this next.
Assignee: benjamin → nobody
Flags: blocking1.9.2+
Priority: -- → P2
Attached patch Possible patchSplinter Review
Maybe allowing the compiler to generate the prologue and epilogue would help.

[I checked and the #pragma forces it even in opt builds.]
Attachment #409673 - Flags: review?(ted.mielczarek)
Comment on attachment 409673 [details] [diff] [review]
Possible patch

I am not qualified to review this. Also, if you can make a try build with this, I can test to see if it fixes the problem.
Attachment #409673 - Flags: review?(ted.mielczarek) → review?(benjamin)
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Clearing approvals since this patch didn't fix our problem.
Attachment #408618 - Flags: approval1.9.2+
Attachment #408618 - Flags: approval1.9.1.5?
Comment on attachment 409673 [details] [diff] [review]
Possible patch

I'd be a little surprised if this actually worked, but I don't think it can hurt.
Attachment #409673 - Flags: review?(benjamin) → review+
(In reply to comment #25)
> Other than "delete the NS_InvokeByIndex_P lines from the .sym file" I don't
> know what to do with this next.

Can we give that a try and see if that helps? This is blocking investigation of a couple of top crash bugs...
There's a patch here, we should see if this fixes the issue before trying other crazy things.
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Pushed changeset c0d760c5165c to mozilla-central.
Assignee: nobody → neil
Blocks: 526224
With today's nightly:
http://crash-stats.mozilla.com/report/index/4155267b-3ea7-4ec8-a3ed-ee14f2091105

Looks good, although we still get hung up at nsRunnable::Release for some reason. We can handle that separately though. This bug is blocking1.9.2+, so feel free to land on 1.9.2 at your leisure.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Pushed changeset 6801f86775b5 to releases/mozilla-1.9.2
Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1
Comment on attachment 409673 [details] [diff] [review]
Possible patch

We should take this patch on 1.9.1 to get better crash stacks in certain situations there. Very low risk, has been on trunk and 1.9.2 for a while now.
Attachment #409673 - Flags: approval1.9.1.7?
(In reply to comment #35)
> Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1

Er, this changeset doesn't exist on 1.9.1? (Also your patch didn't have approval1.9.1.x anyway.)
(In reply to comment #34)
>(From update of attachment 408618 [details] [diff] [review])
> Pushed changeset 6801f86775b5 to releases/mozilla-1.9.2

(In reply to comment #35)
> Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1

Sorry, both of these were actually pushed to releases/mozilla-1.9.2; the bug history shows me setting the 1.9.2 status flag, and you correcting it ;-)
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Seems like we should take it on 1.9.1.6 if possible.
Attachment #409673 - Flags: approval1.9.1.6?
(In reply to comment #36)
> We should take this patch on 1.9.1 to get better crash stacks in certain
> situations there. Very low risk, has been on trunk and 1.9.2 for a while now.

Has it been proved to work? There were several comments (e.g. comment 29) that expressed doubts. Given that it doesn't seem worth hauling 1.9.1.6 back to the station, we'll catch it in 1.9.1.7
Yes, it works, I verified in comment 33. (Sorry, perhaps that wasn't clear.)
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Approved for 1.9.1.7. a=ss
Attachment #409673 - Flags: approval1.9.1.7? → approval1.9.1.7+
Pushed both patches to 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/c92853c1c3f1
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ff4253bc919b

QA: this can be verified by using "crash me now", submitting a crash, and verifying that the resulting stack goes beyond XPCWrappedNative::CallMethod.
Verified fixed on 1.9.1 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8pre) Gecko/20100128 Shiretoko/3.5.8pre (.NET CLR 3.5.30729) ID:20100128044603

http://crash-stats.mozilla.com/report/index/0adabf50-3d4b-42de-bb37-973812100128
Keywords: verified1.9.1
Target Milestone: --- → mozilla1.9.3a1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: