Last Comment Bug 524256 - Crash stacks not unwound past XPCWrappedNative::CallMethod
: Crash stacks not unwound past XPCWrappedNative::CallMethod
Status: VERIFIED FIXED
[crashkill]
: verified1.9.1
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: mozilla1.9.3a1
Assigned To: neil@parkwaycc.co.uk
:
: Nathan Froyd [:froydnj]
Mentors:
Depends on:
Blocks: 505059 516113 524255 526224
  Show dependency treegraph
 
Reported: 2009-10-23 20:55 PDT by Jesse Ruderman
Modified: 2010-01-28 14:03 PST (History)
15 users (show)
dsicore: blocking1.9.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta2-fixed
.8-fixed


Attachments
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1 (1.14 KB, patch)
2009-10-27 09:40 PDT, Benjamin Smedberg [:bsmedberg]
ted: review+
Details | Diff | Splinter Review
Possible patch (1.09 KB, patch)
2009-11-02 04:27 PST, neil@parkwaycc.co.uk
benjamin: review+
samuel.sidler+old: approval1.9.1.8+
Details | Diff | Splinter Review

Description Jesse Ruderman 2009-10-23 20:55:41 PDT
Stacks for many signatures are getting cut off at XPCWrappedNative::CallMethod.

bp-8c98528d-a4d5-4ee5-a0cc-c26332091023  
Bug 505059 -- [@ CSSLoaderImpl::LoadSheet...]

bp-d04a9331-4adc-4240-8435-bc3c52091022  
Bug 524255 -- [@ _purecall | CallQueryInterface...]
  
CCing the heroes of bug 519616.
Comment 1 David Baron :dbaron: ⌚️UTC-8 2009-10-23 21:04:24 PDT
Seems more likely a Gecko bug, in that our assembly-coded NS_InvokeByIndex_P probably isn't providing all the debug information that compiler-generated code does.
Comment 2 Jeff Muizelaar [:jrmuizel] 2009-10-24 06:36:33 PDT
(In reply to comment #1)
> Seems more likely a Gecko bug, in that our assembly-coded NS_InvokeByIndex_P
> probably isn't providing all the debug information that compiler-generated code
> does.

I know it's possible to emit CFI annotations on linux and presumably OS X (http://lkml.indiana.edu/hypermail/linux/kernel/0509.1/0118.html). Has anyone seen anything similar for MSVC?

Still, even without stack unwind information, we should be able to unwind the stack. Does anyone have a minidump of a crash that fails like these ones?
Comment 3 Benjamin Smedberg [:bsmedberg] 2009-10-26 12:05:12 PDT
Assuming this is Windows-only: does NS_InvokeByIndex use a frame pointer? Breakpad (and most debuggers) assume that if debugging information is not present for a function, the function uses a frame pointer in EBP

http://mxr.mozilla.org/mozilla-central/source/xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp#79
Comment 4 Jeff Muizelaar [:jrmuizel] 2009-10-26 12:25:08 PDT
(In reply to comment #3)
> Assuming this is Windows-only: does NS_InvokeByIndex use a frame pointer?
> Breakpad (and most debuggers) assume that if debugging information is not
> present for a function, the function uses a frame pointer in EBP

Yes, it appears to use a frame pointer.
Comment 5 Benjamin Smedberg [:bsmedberg] 2009-10-26 12:29:03 PDT
I wonder if declspec(naked) emits some sort of debugging information that isn't correct for the function.
Comment 6 Ted Mielczarek [:ted.mielczarek] 2009-10-26 13:21:51 PDT
Jeff: you can easily produce one using "Crash Me Now!" and grabbing the minidump out of the Crash Reports/pending/ directory before quitting the crash reporter:

Stacks from that look like:
http://crash-stats.mozilla.com/report/index/bp-73cd938b-98ae-4148-9757-4433b2090929
Comment 7 Jeff Muizelaar [:jrmuizel] 2009-10-26 14:03:27 PDT
Running the stackwalker on the minidump from this crash:
http://crash-stats.mozilla.com/report/index/bd254b84-4ed3-4a37-a763-268702091026
gives me a much more useful. Perhaps breakpad was not properly updated?
Comment 8 Jeff Muizelaar [:jrmuizel] 2009-10-26 14:08:05 PDT
The minidump is available at:

http://people.mozilla.org/~jmuizelaar/
Comment 9 Ted Mielczarek [:ted.mielczarek] 2009-10-26 17:09:40 PDT
AFAICT the new code is running in production, evidenced by the lack of 0x0 crashes in the topcrash list.
Comment 10 Ted Mielczarek [:ted.mielczarek] 2009-10-26 17:16:23 PDT
When I run minidump_stackwalk (from SVN tip) on your dump, I just get:
Thread 0 (crashed)
 0  crashme.dll!nsCrasher::Crash() [nsCrasher.cpp : 21 + 0x0]
    eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
    esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
    edx = 0x00951c00   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  xul.dll!NS_InvokeByIndex_P [xptcinvoke.cpp:ef4565b6f268 : 101 + 0x2]
    eip = 0x50691636   esp = 0x0013eae4   ebp = 0x0013eae8
    Found by: call frame info
 2  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::Call
Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
    eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013eae8
    Found by: call frame info with scanning

in the crashing thread, same as crash-stats. You can see that it resorts to stack scanning to even get frame 2, then it must just get lost.
Comment 11 Jeff Muizelaar [:jrmuizel] 2009-10-26 17:44:41 PDT
(In reply to comment #10)
> When I run minidump_stackwalk (from SVN tip) on your dump, I just get:
> Thread 0 (crashed)
>  0  crashme.dll!nsCrasher::Crash() [nsCrasher.cpp : 21 + 0x0]
>     eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
>     esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
>     edx = 0x00951c00   efl = 0x00010246
>     Found by: given as instruction pointer in context
>  1  xul.dll!NS_InvokeByIndex_P [xptcinvoke.cpp:ef4565b6f268 : 101 + 0x2]
>     eip = 0x50691636   esp = 0x0013eae4   ebp = 0x0013eae8
>     Found by: call frame info
>  2  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext
> &,XPCWrappedNative::Call
> Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
>     eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013eae8
>     Found by: call frame info with scanning
> 
> in the crashing thread, same as crash-stats. You can see that it resorts to
> stack scanning to even get frame 2, then it must just get lost.

The reason you run into the problem is because you have symbols for crashme.dll. I didn't have those so I was going from Crash() to CallMethod() and using the frame pointer from there.

Why does the crash server have symbols for crashme.dll and how do I get them? They don't seem to be listed in symbol index file.
Comment 12 Ted Mielczarek [:ted.mielczarek] 2009-10-27 03:38:17 PDT
Ah, interesting. I manually uploaded the crashme symbols after I built the extension. They're in the same directory as our OS symbols, which isn't available via HTTP. I copied it over to people if you want it:
http://people.mozilla.org/~tmielczarek/crashme.sym
Comment 13 Ted Mielczarek [:ted.mielczarek] 2009-10-27 05:51:43 PDT
Yeah, that is interesting. If I process it without the crashme symbols it gets the full stack. I guess the NS_InvokeByIndex_P frame is kind of a tar pit. Regardless, the crashes in comment 0 have the same exact problem exhibited by your minidump when we have symbols for the top of the stack, so we should figure out why this is. Here's the top 4 frames I get when I process without crashme symbols:
Thread 0 (crashed)
 0  crashme.dll + 0x11bd
    eip = 0x009311bd   esp = 0x0013eae0   ebp = 0x0013eae8   ebx = 0x0013eda8
    esi = 0x00000000   edi = 0x00000003   eax = 0x00000003   ecx = 0x038d1120
    edx = 0x00951c00   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::Call
Mode) [xpcwrappednative.cpp:ef4565b6f268 : 2690 + 0x1a]
    eip = 0x5052e7f1   esp = 0x0013eaf0   ebp = 0x0013ed78
    Found by: previous frame's frame pointer
 2  xul.dll!XPC_WN_CallMethod(JSContext *,JSObject *,unsigned int,int *,int *) [
xpcwrappednativejsops.cpp:ef4565b6f268 : 1756 + 0x11]
    eip = 0x5052ad40   esp = 0x0013ed80   ebp = 0x035010b4
    Found by: previous frame's frame pointer
 3  mozjs.dll!js_Invoke [jsinterp.cpp:ef4565b6f268 : 1365 + 0x14]
    eip = 0x623a0aca   esp = 0x0013ee4c   ebp = 0x0013eee8
    Found by: call frame info
 4  mozjs.dll!js_Interpret [jsops.cpp:ef4565b6f268 : 2291 + 0xd]
    eip = 0x623986e1   esp = 0x0013eef0   ebp = 0x0013f0e4
    Found by: previous frame's frame pointer

I poked through the debug info a little bit, and here's a brain dump. NS_InvokeByIndex_P looks like this:
FILE 338447 hg:hg.mozilla.org/mozilla-central:xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:ef4565b6f268
FUNC 27160f 2b 10 NS_InvokeByIndex_P
STACK WIN 4 27160f 2b 0 0 10 0 0 0 1 $T2 $esp .cbLocals + .cbSavedRegs + = $T0 .raSearchStart = $eip $T0 ^ = $esp $T0 4 + =

That program string says:
$T0 = .raSearchStart
$eip = *$T0
$esp = $T0 + 4
(Where .raSearchStart is http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_x86.cc#160, 0x13eae8 in this specific case)
The value of %eip comes out bogus here (0x13ed78), so we resort to stack scanning, which does find the next frame. However, it gives us %eip = 0x5052e7f1 (rva = 0x10e7f1), which is actually way past the end of XPC_WN_CallMethod (rva = 0x10aba0, size = 0x26a). Since the IP isn't actually contained in this function, we fail to find the FPO data for it here:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#427

Which means we have to resort to %ebp walking, but our value of %ebp is clearly not valid for this frame, so we just fail. I think this indicates that there's something failing in the debug info for NS_InvokeByIndex_P, since we should be able to find the next frame from there strictly from program string evaluation. I wouldn't be surprised to find that the __declspec(naked) is doing something weird.
Comment 14 Benjamin Smedberg [:bsmedberg] 2009-10-27 06:35:36 PDT
Yeah, that program string is wrong for NS_InvokeByIndex: no reference to $ebp, it's assuming the function doesn't have a frame pointer because optimizations are on and it doesn't see the assembly. I have a patch which might work.
Comment 15 Ted Mielczarek [:ted.mielczarek] 2009-10-27 06:59:18 PDT
I don't think we should spend a lot of time on this, while it's clearly a bug, I don't think the resulting stacks are likely to help much with crash analysis. For example, the rest of the stack in this "crash me now" crash would just tell you that JavaScript was invoked from the event loop.

(But if bsmedberg's patch works, then great, let's just fix it.)
Comment 16 Jesse Ruderman 2009-10-27 07:40:11 PDT
It could help in bug 505059 by telling us where we are in shutdown, or in bug 524255 by telling us if the _purecall is due to there being a destructor on the stack.  And it would dispel the feeling that the stacks are "busted" and therefore ignorable.
Comment 17 Blake Kaplan (:mrbkap) 2009-10-27 07:45:06 PDT
Also, XPCWrappedNative is called from a lot of places (we do sometimes do have stacks that go C++ -> JS -> C++) and having that extra context can be useful.
Comment 18 Benjamin Smedberg [:bsmedberg] 2009-10-27 09:40:20 PDT
Created attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

#pragmas FTW
This basically suppresses the custom STACK operations for NS_InvokeByIndex so that the default frame-pointer operation takes over.
Comment 19 Ted Mielczarek [:ted.mielczarek] 2009-10-27 09:58:43 PDT
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Nice!
Comment 20 Ted Mielczarek [:ted.mielczarek] 2009-10-27 11:10:17 PDT
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

We should land this on branches too. It's very simple, doesn't change any behavior, and will help us get better crash stacks immediately.
Comment 21 Ted Mielczarek [:ted.mielczarek] 2009-10-28 03:21:47 PDT
Landed in m-c:
http://hg.mozilla.org/mozilla-central/rev/15f4223cd977
Comment 22 Benjamin Smedberg [:bsmedberg] 2009-10-28 10:01:40 PDT
Can someone verify using today's nightly that this in fact is fixed, before I start on the branches?
Comment 23 Ted Mielczarek [:ted.mielczarek] 2009-10-28 10:18:16 PDT
Today's nightly shows:
FUNC 2763cf 2b 10 NS_InvokeByIndex_P
STACK WIN 4 2763cf 2b 0 0 10 0 0 0 1 $T2 $esp .cbLocals + .cbSavedRegs + = $T0 .
raSearchStart = $eip $T0 ^ = $esp $T0 4 + =

Perhaps WPO or PGO (or both) are screwing you up here? I notice the optimize pragma takes a "g" for "global optimizations". Might be worth a shot.
Comment 24 Ted Mielczarek [:ted.mielczarek] 2009-10-28 10:19:24 PDT
(Also, I tried the obvious "process a minidump from crash me now produced with today's nightly" before looking at the symbol files.)
Comment 25 Benjamin Smedberg [:bsmedberg] 2009-10-28 15:55:31 PDT
Other than "delete the NS_InvokeByIndex_P lines from the .sym file" I don't know what to do with this next.
Comment 26 neil@parkwaycc.co.uk 2009-11-02 04:27:50 PST
Created attachment 409673 [details] [diff] [review]
Possible patch

Maybe allowing the compiler to generate the prologue and epilogue would help.

[I checked and the #pragma forces it even in opt builds.]
Comment 27 Ted Mielczarek [:ted.mielczarek] 2009-11-02 04:32:39 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

I am not qualified to review this. Also, if you can make a try build with this, I can test to see if it fixes the problem.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2009-11-02 04:33:00 PST
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Clearing approvals since this patch didn't fix our problem.
Comment 29 Benjamin Smedberg [:bsmedberg] 2009-11-02 12:47:54 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

I'd be a little surprised if this actually worked, but I don't think it can hurt.
Comment 30 Johnny Stenback (:jst, jst@mozilla.com) 2009-11-03 18:07:50 PST
(In reply to comment #25)
> Other than "delete the NS_InvokeByIndex_P lines from the .sym file" I don't
> know what to do with this next.

Can we give that a try and see if that helps? This is blocking investigation of a couple of top crash bugs...
Comment 31 Ted Mielczarek [:ted.mielczarek] 2009-11-03 18:12:17 PST
There's a patch here, we should see if this fixes the issue before trying other crazy things.
Comment 32 neil@parkwaycc.co.uk 2009-11-05 03:58:07 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Pushed changeset c0d760c5165c to mozilla-central.
Comment 33 Ted Mielczarek [:ted.mielczarek] 2009-11-05 11:40:23 PST
With today's nightly:
http://crash-stats.mozilla.com/report/index/4155267b-3ea7-4ec8-a3ed-ee14f2091105

Looks good, although we still get hung up at nsRunnable::Release for some reason. We can handle that separately though. This bug is blocking1.9.2+, so feel free to land on 1.9.2 at your leisure.
Comment 34 neil@parkwaycc.co.uk 2009-11-05 14:44:52 PST
Comment on attachment 408618 [details] [diff] [review]
Use #pragmas to tell MSVC that NS_InvokeByIndex has a frame pointer, rev. 1

Pushed changeset 6801f86775b5 to releases/mozilla-1.9.2
Comment 35 neil@parkwaycc.co.uk 2009-11-05 14:45:32 PST
Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1
Comment 36 Ted Mielczarek [:ted.mielczarek] 2009-11-17 04:31:17 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

We should take this patch on 1.9.1 to get better crash stacks in certain situations there. Very low risk, has been on trunk and 1.9.2 for a while now.
Comment 37 Ted Mielczarek [:ted.mielczarek] 2009-11-17 08:15:27 PST
(In reply to comment #35)
> Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1

Er, this changeset doesn't exist on 1.9.1? (Also your patch didn't have approval1.9.1.x anyway.)
Comment 38 neil@parkwaycc.co.uk 2009-11-17 08:26:44 PST
(In reply to comment #34)
>(From update of attachment 408618 [details] [diff] [review])
> Pushed changeset 6801f86775b5 to releases/mozilla-1.9.2

(In reply to comment #35)
> Pushed changeset e3756eea61f0 to releases/mozilla-1.9.1

Sorry, both of these were actually pushed to releases/mozilla-1.9.2; the bug history shows me setting the 1.9.2 status flag, and you correcting it ;-)
Comment 39 David Baron :dbaron: ⌚️UTC-8 2009-11-17 09:05:01 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Seems like we should take it on 1.9.1.6 if possible.
Comment 40 Daniel Veditz [:dveditz] 2009-11-19 15:54:23 PST
(In reply to comment #36)
> We should take this patch on 1.9.1 to get better crash stacks in certain
> situations there. Very low risk, has been on trunk and 1.9.2 for a while now.

Has it been proved to work? There were several comments (e.g. comment 29) that expressed doubts. Given that it doesn't seem worth hauling 1.9.1.6 back to the station, we'll catch it in 1.9.1.7
Comment 41 Ted Mielczarek [:ted.mielczarek] 2009-11-19 15:59:13 PST
Yes, it works, I verified in comment 33. (Sorry, perhaps that wasn't clear.)
Comment 42 Samuel Sidler (old account; do not CC) 2009-12-03 06:15:32 PST
Comment on attachment 409673 [details] [diff] [review]
Possible patch

Approved for 1.9.1.7. a=ss
Comment 43 Ted Mielczarek [:ted.mielczarek] 2009-12-03 06:24:32 PST
Pushed both patches to 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/c92853c1c3f1
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ff4253bc919b

QA: this can be verified by using "crash me now", submitting a crash, and verifying that the resulting stack goes beyond XPCWrappedNative::CallMethod.
Comment 44 Henrik Skupin (:whimboo) 2010-01-28 14:03:06 PST
Verified fixed on 1.9.1 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8pre) Gecko/20100128 Shiretoko/3.5.8pre (.NET CLR 3.5.30729) ID:20100128044603

http://crash-stats.mozilla.com/report/index/0adabf50-3d4b-42de-bb37-973812100128

Note You need to log in before you can comment on or make changes to this bug.