Closed
Bug 519616
Opened 15 years ago
Closed 15 years ago
Some crashes don't get unwound by the minidump processor usefully [@ @0x0] [@ @0x1]
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jrmuizel, Assigned: ted)
References
Details
(Whiteboard: [crashkill])
Attachments
(2 files)
1.71 KB,
patch
|
Details | Diff | Splinter Review | |
9.04 KB,
patch
|
Details | Diff | Splinter Review |
One example is crash d6eddf4e-a71e-4efe-b796-924112090928.
The minidump has a 5368 bytes of stack for the crashing thread, but we currently don't get any useful data out of it at all. We should be able to do better.
Reporter | ||
Comment 1•15 years ago
|
||
Here's a very quick and dirty manual unwind of the stack.
Not everything below is necessarily sane but it looks like we're crashing during plugin initialization.
google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread(_EXCEPTION_POINTERS *,MDRawAssertionInfo *)
google_breakpad::ExceptionHandler::HandleException(_EXCEPTION_POINTERS *)
...
ComputeBorderCornerDimensions
nsCOMPtr_base::~nsCOMPtr_base()
...
nsPluginInstanceOwner::GetMode(nsPluginMode *)
nsNPAPIPluginInstance::InitializePlugin(nsIPluginInstancePeer
*)
nsPluginHostImpl::TrySetUpPluginInstance(char const *,nsIURI
*,nsIPluginInstanceOwner *)
nsCOMPtr_base::assign_from_qi(nsQueryInterface,nsID const &)
PL_DHashMatchStringKey
PL_DHashTableOperate
nsComponentManagerImpl::GetFactoryEntry(char const *,unsigned in
t
nsComponentManagerImpl::IsContractIDRegistered(char const *,int *)
SearchTable
NS_TableDrivenQI(void *,QITableEntry const *,nsID const &,void *
*)
nsPluginHostImpl::QueryInterface(nsID const &,void * *)
etc...
Assignee | ||
Comment 2•15 years ago
|
||
WinDBG gives a similar stack:
ChildEBP RetAddr
0012eb08 7c90df5a ntdll!KiFastSystemCallRet
0012eb0c 7c8025db ntdll!ZwWaitForSingleObject+0xc
0012eb70 7c802542 kernel32!WaitForSingleObjectEx+0xa8
0012eb84 103a4236 kernel32!WaitForSingleObject+0x12
0012eb9c 1048ec2b xul!google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread(struct _EXCEPTION_POINTERS * exinfo = 0x0012eec0, struct MDRawAssertionInfo * assertion = 0x0caa17fe)+0x6a [e:\builds\moz2_slave\win32_build\build\toolkit\crashreporter\google-breakpad\src\client\windows\handler\exception_handler.cc @ 562]
0012ebc0 6681f1b7 xul!google_breakpad::ExceptionHandler::HandleException(struct _EXCEPTION_POINTERS * exinfo = 0x0012eec0)+0x53 [e:\builds\moz2_slave\win32_build\build\toolkit\crashreporter\google-breakpad\src\client\windows\handler\exception_handler.cc @ 394]
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ee98 7c8438fa QuickTime+0x1f1b7
0012eea0 7c839b39 kernel32!BaseProcessStart+0x39
0012eec8 7c9032a8 kernel32!_except_handler3+0x61
0012eeec 7c90327a ntdll!ExecuteHandler2+0x26
0012ef9c 7c90e48a ntdll!ExecuteHandler+0x24
0012ef9c 00000000 ntdll!KiUserExceptionDispatcher+0xe
0012f298 669c22d7 0x0
0012f32c 100b502e QuickTime+0x1c22d7
0012f334 10699db7 xul!nsCOMPtr_base::~nsCOMPtr_base(void)+0xe [e:\builds\moz2_slave\win32_build\build\obj-firefox\xpcom\build\nscomptr.cpp @ 82]
0012f344 1070c805 xul!nsPluginInstanceOwner::GetMode(nsPluginMode * aMode = <Memory access error>)+0x61 [e:\builds\moz2_slave\win32_build\build\layout\generic\nsobjectframe.cpp @ 2362]
00000000 00000000 xul!nsNPAPIPluginInstance::InitializePlugin(class nsIPluginInstancePeer * peer = <Memory access error>)+0x238 [e:\builds\moz2_slave\win32_build\build\modules\plugin\base\src\nsnpapiplugininstance.cpp @ 1030]
Note that, of course, Breakpad doesn't just walk the stack from the top on the crashed thread, but starts from the register state in the exception context:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/minidump_processor.cc#167
Assignee | ||
Comment 3•15 years ago
|
||
!analyze -v, gives, among other things:
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f298 669c22d7 0012f318 0012f310 00000000 0x0
0012f32c 100b502e 0407cc00 10699db7 0416a220 QuickTime+0x1c22d7
0012f334 10699db7 0416a220 0416a220 00000000 xul!nsCOMPtr_base::~nsCOMPtr_base+0xe [e:\builds\moz2_slave\win32_build\build\obj-firefox\xpcom\build\nscomptr.cpp @ 82]
0012f344 1070c805 07d18f20 076cb418 00000001 xul!nsPluginInstanceOwner::GetMode+0x61 [e:\builds\moz2_slave\win32_build\build\layout\generic\nsobjectframe.cpp @ 2362]
00000000 00000000 00000000 00000000 00000000 xul!nsNPAPIPluginInstance::InitializePlugin+0x238 [e:\builds\moz2_slave\win32_build\build\modules\plugin\base\src\nsnpapiplugininstance.cpp @ 1030]
QuickTime, j'accuse! Now, we just need to figure out why WinDBG can walk this stack and Breakpad can't.
Assignee | ||
Comment 4•15 years ago
|
||
Note that if you open a dump in WinDBG, you can use ".excr" to set your current context to the exception context, such that your displayed register state and call stack will be from the exception context.
Assignee | ||
Comment 5•15 years ago
|
||
Also, to add to the fun in this particular report:
http://crash-stats.mozilla.com/report/index/d6eddf4e-a71e-4efe-b796-924112090928
cvasds1.dll
e8main1.dll
are apparently from Trojan.Dropper/Gen-NV.
Assignee | ||
Comment 6•15 years ago
|
||
A little bit of examination shows that we hit this block in the stack walker:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_x86.cc#257
since %eip is 0x0, which is clearly not in a known module, we get shunted to the "standard calling convention" path. However, %ebp is off in the weeds (0x6170706c), so evaluating all of those conditions fails, so the stack walker gives up and returns NULL:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_x86.cc#293
There must be a smarter way to handle this particular situation. From running minidump_dump on the dump, we have:
stack.start_of_memory_range = 0x12eb08
stack.memory.data_size = 0x14f8
and %esp looks sane: 0x12f29c. I think the stack walker just needs to try a little harder here.
Reporter | ||
Comment 7•15 years ago
|
||
With this quick patch, I get the following stack:
0 0x0
eip = 0x00000000 esp = 0x0012f29c ebp = 0x6170706c ebx = 0x00000000
esi = 0x0800b5fe edi = 0x000001f1 eax = 0x00040002 ecx = 0x07fa4ff0
edx = 0x07ff67e0 efl = 0x00010286
1 QuickTime.qts + 0x1c22d6
eip = 0x669c22d7 esp = 0x0012f2a0 ebp = 0x6170706c
2 QuickTime.qts + 0x98225
eip = 0x66898226 esp = 0x0012f2d0 ebp = 0x6170706c
3 QuickTime.qts + 0x1c1f2f
eip = 0x669c1f30 esp = 0x0012f2f0 ebp = 0x6170706c
4 QuickTime.qts + 0x11b2af
eip = 0x6691b2b0 esp = 0x0012f308 ebp = 0x6170706c
5 QuickTimeWebHelper.qtx + 0x9f3c
eip = 0x675a9f3d esp = 0x0012f318 ebp = 0x6170706c
6 xul.dll!nsCOMPtr_base::~nsCOMPtr_base() [nsCOMPtr.cpp:c6f51c76fb5d : 81 + 0x7]
eip = 0x100b502e esp = 0x0012f334 ebp = 0x6170706c
7 xul.dll!nsPluginInstanceOwner::GetMode(nsPluginMode *) [nsObjectFrame.cpp:c6f51c76fb5d : 2362 + 0xf]
eip = 0x10699db7 esp = 0x0012f33c ebp = 0x6170706c
8 xul.dll!nsNPAPIPluginInstance::InitializePlugin(nsIPluginInstancePeer *) [nsNPAPIPluginInstance.cpp:c6f51c76fb5d : 1030 + 0x4f]
eip = 0x1070c805 esp = 0x0012f34c ebp = 0x6170706c
9 xul.dll + 0x9a49bb
eip = 0x109a49bc esp = 0x0012f37c ebp = 0x6170706c
10 xul.dll + 0x9a49bb
eip = 0x109a49bc esp = 0x0012f380 ebp = 0x6170706c
11 mozcrt19.dll!arena_bin_nonfull_run_get [jemalloc.c:c6f51c76fb5d : 3795 + 0x6]
eip = 0x78139637 esp = 0x0012f38c ebp = 0x6170706c
12 xul.dll + 0x29bae5
eip = 0x1029bae6 esp = 0x0012f3c8 ebp = 0x0012f444
13 xul.dll + 0x29bcd7
eip = 0x1029bcd8 esp = 0x0012f44c ebp = 0x0012f740
14 xul.dll + 0x299af2
eip = 0x10299af3 esp = 0x0012f748 ebp = 0x0012f934
15 xul.dll + 0x29c526
eip = 0x1029c527 esp = 0x0012f93c ebp = 0x0012f994
16 xul.dll + 0x29c6bb
eip = 0x1029c6bc esp = 0x0012f99c ebp = 0x0012ffb0
17 firefox.exe!_IsNonwritableInCurrentImage + 0xd
eip = 0x004018e8 esp = 0x0012ffb8 ebp = 0x0012ffe0
18 kernel32.dll + 0x39ad7
eip = 0x7c839ad8 esp = 0x0012ffe8 ebp = 0xffffffff
19 kernel32.dll + 0x1707f
eip = 0x7c817080 esp = 0x0012ffec ebp = 0xffffffff
20 firefox.exe!pre_c_init + 0x3
eip = 0x004015b0 esp = 0x0012fffc ebp = 0xffffffff
Assignee | ||
Comment 8•15 years ago
|
||
I implemented the TODO mentioned in this patch:
http://people.mozilla.com/~tmielczarek/stackwalker-guess-harder.patch
I don't have all the symbols for that version of Firefox handy though, so Jeff was going to test it.
Reporter | ||
Comment 9•15 years ago
|
||
(In reply to comment #8)
> I don't have all the symbols for that version of Firefox handy though, so Jeff
> was going to test it.
I couldn't get that patch to help. Haven't looked into why yet.
One of the other things we can try to do is get a useful value into ebp. I tried this, but was getting worse results because of something that looks like a framepointer in quicktime. I'm not sure how to fix that yet.
Reporter | ||
Comment 10•15 years ago
|
||
One the big problems with the current patch is that once we are able to walk through the quicktime stack using brute force, we're never able to use the more elegant ways of stackwalking. This is what's giving us a bunch of false positives.
Ideally, we could get back on track and start using the FPO data again.
Assignee | ||
Comment 11•15 years ago
|
||
Yeah, that's a bummer. We just sort of stumble our way back into a random part of libxul and then we're off by just enough to get a crummy stack from there, right?
Reporter | ||
Comment 12•15 years ago
|
||
(In reply to comment #11)
> Yeah, that's a bummer. We just sort of stumble our way back into a random part
> of libxul and then we're off by just enough to get a crummy stack from there,
> right?
Yeah, I think so.
It also looks like we don't have frame data for all the functions. For example, if you look at xul.sym, we have FPO data for nsCOMPtr_base::~nsCOMPtr_base()
but not for nsPluginInstanceOwner::GetMode() or nsNPAPIPluginInstance::InitializePlugin().
Further, do you know why we get STACK_INFO_FRAME_DATA records for some code and STACK_INFO_FPO for other code?
Assignee | ||
Comment 13•15 years ago
|
||
dump_syms just dumps out whatever (underdocumented) data is in the PDB files using the DIA APIs:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/windows/pdb_source_line_writer.cc#272
You can see the STACK WIN lines in any Windows symbol file:
http://symbols.mozilla.org/firefox/firefox.pdb/B9F3DF1DC69045E29B7E9877E67F99EC2/firefox.sym
Some of them have program strings, some don't. I guess it just depends on what the compiler decided to do.
Reporter | ||
Comment 14•15 years ago
|
||
(In reply to comment #13)
> You can see the STACK WIN lines in any Windows symbol file:
> http://symbols.mozilla.org/firefox/firefox.pdb/B9F3DF1DC69045E29B7E9877E67F99EC2/firefox.sym
> Some of them have program strings, some don't. I guess it just depends on what
> the compiler decided to do.
But some have no entry at all. Which means we can't really unwind the stack very well. Is it possible there's a bug someplace that's preventing us from getting entries for some code?
Assignee | ||
Comment 15•15 years ago
|
||
You can look at that pdb_source_line_writer code, it's a pretty straightforward application of the DIA APIs.
Reporter | ||
Comment 16•15 years ago
|
||
I had a look at the pdb's with dia2dump and then ones that I downloaded from the symbol server have significantly fewer fpo entries then the one that I have built. (10098 lines of output vs. 160667 lines of output)
Assuming the stack rva's are in order the stack rva's in the downloaded ones only go up to about 0x29ccb0 instead of the 0x77fa4d you would expect.... No idea what would cause this...
Reporter | ||
Comment 17•15 years ago
|
||
The difference could be related to the following linker flag? '-DEBUG -DEBUGTYPE:CV'"
Reporter | ||
Comment 18•15 years ago
|
||
"On the command line, if /DEBUG is specified, the default type is /DEBUGTYPE:CV; if /DEBUG is not specified, /DEBUGTYPE is ignored." so perhaps not.
Assignee | ||
Comment 19•15 years ago
|
||
Should we split off a separate bug for the actual crash we identified here, involving QuickTime?
Reporter | ||
Comment 20•15 years ago
|
||
(In reply to comment #19)
> Should we split off a separate bug for the actual crash we identified here,
> involving QuickTime?
I did so as bug 520650.
Reporter | ||
Comment 21•15 years ago
|
||
I've also split off the missing stack unwind info bug as bug 520651.
Reporter | ||
Comment 22•15 years ago
|
||
(In reply to comment #19)
> Should we split off a separate bug for the actual crash we identified here,
> involving QuickTime?
Yes we probably should.
Reporter | ||
Comment 23•15 years ago
|
||
These crashes also aren't very useful:
http://crash-stats.mozilla.com/report/index/bp-622aa543-7a0a-491e-854d-120cd2091005
http://crash-stats.mozilla.com/report/index/7c669bd2-c464-42eb-8fd2-ae9552091005
For example, XUL doesn't have symbols and I'm not sure why.
Assignee | ||
Comment 24•15 years ago
|
||
Looks like a different problem, but also something is confused there:
Version 3.7a1pre
Branch 1.9.2
those do not match!
Comment 25•15 years ago
|
||
(In reply to comment #23)
> These crashes also aren't very useful:
> ...
> For example, XUL doesn't have symbols and I'm not sure why.
I have seen a similar example in bug 512810 where crash reports from nightly builds appear to be missing symbols for xul but the reports from 3.6a1 work (sort of).
(In reply to comment #24)
> Looks like a different problem, but also something is confused there:
> Version 3.7a1pre
> Branch 1.9.2
> those do not match!
Looking at a selection of crash reports for trunk, they all seem to have branch = 1.9.2
Assignee | ||
Comment 26•15 years ago
|
||
I filed that issue as bug 520852.
Reporter | ||
Comment 27•15 years ago
|
||
Here's the stack with my work in progress patch and a description of what's going on:
Thread 0 (crashed)
0 0x0
eip = 0x00000000 esp = 0x0012f29c ebp = 0x6170706c ebx = 0x00000000
esi = 0x0800b5fe edi = 0x000001f1 eax = 0x00040002 ecx = 0x07fa4ff0
edx = 0x07ff67e0 efl = 0x00010286
trust: none
1 QuickTime.qts + 0x1c22d6
eip = 0x669c22d7 esp = 0x0012f2a0 ebp = 0x6170706c
trust: scan
2 QuickTime.qts + 0x98225
eip = 0x66898226 esp = 0x0012f2d0 ebp = 0x6170706c
trust: scan
3 QuickTime.qts + 0x1c1f2f
eip = 0x669c1f30 esp = 0x0012f2f0 ebp = 0x6170706c
trust: scan
4 QuickTime.qts + 0x11b2af
eip = 0x6691b2b0 esp = 0x0012f308 ebp = 0x6170706c
trust: scan
5 QuickTimeWebHelper.qtx + 0x9f3c
eip = 0x675a9f3d esp = 0x0012f318 ebp = 0x6170706c
trust: scan
6 xul.dll!nsCOMPtr_base::~nsCOMPtr_base() [nsCOMPtr.cpp:c6f51c76fb5d : 81 + 0x7]
eip = 0x100b502e esp = 0x0012f334 ebp = 0x6170706c
trust: scan
* everything looks good to this point. The new 'scan' method lets us unwind through the QuickTime stack. Note: it looks like QuickTime has been compiled without a framepointer
7 xul.dll!nsPluginInstanceOwner::GetMode(nsPluginMode *) [nsObjectFrame.cpp:c6f51c76fb5d : 2362 + 0xf]
eip = 0x10699db7 esp = 0x0012f33c ebp = 0x6170706c
trust: cfi_scan
* In frame 7 we can use the unwind info from frame 6. I'm not exactly sure why we have to revert to scanning though.
8 xul.dll!nsNPAPIPluginInstance::InitializePlugin(nsIPluginInstancePeer *) [nsNPAPIPluginInstance.cpp:c6f51c76fb5d : 1030 + 0x4f]
eip = 0x1070c805 esp = 0x0012f34c ebp = 0x6170706c
trust: scan
* Frame 7 is missing unwind info, so we're forced back into the scan method
9 mozcrt19.dll!arena_bin_nonfull_run_get [jemalloc.c:c6f51c76fb5d : 3795 + 0x6]
eip = 0x78139637 esp = 0x0012f38c ebp = 0x6170706c
trust: scan
10 xul.dll + 0x29bae5
eip = 0x1029bae6 esp = 0x0012f3c8 ebp = 0x0012f444
trust: cfi_scan
* Frame 9 has unwind info so we try to use it again. Unfortunately, it also has something that looks like a frame pointer on the stack, so ebp get's set to that.
11 xul.dll + 0x29bcd7
eip = 0x1029bcd8 esp = 0x0012f44c ebp = 0x0012f740
trust: fp
* Since we now have a frame pointer (or something that looks like one) we use
that. This brings us through the 0x29.... range of code. I'm not sure what's actually here because xul.sym does not have any symbols for this range. The frame pointer unwinder doesn't use AddressSeemsValid() so we don't have symbols for these frames.
12 xul.dll + 0x299af2
eip = 0x10299af3 esp = 0x0012f748 ebp = 0x0012f934
trust: fp
13 xul.dll + 0x29c526
eip = 0x1029c527 esp = 0x0012f93c ebp = 0x0012f994
trust: fp
14 xul.dll + 0x29c6bb
eip = 0x1029c6bc esp = 0x0012f99c ebp = 0x0012ffb0
trust: fp
* More of the same.
15 firefox.exe!_IsNonwritableInCurrentImage + 0xd
eip = 0x004018e8 esp = 0x0012ffb8 ebp = 0x0012ffe0
trust: fp
16 kernel32.dll + 0x39ad7
eip = 0x7c839ad8 esp = 0x0012ffe8 ebp = 0xffffffff
trust: fp
17 kernel32.dll + 0x1707f
eip = 0x7c817080 esp = 0x0012ffec ebp = 0xffffffff
trust: scan
18 firefox.exe!pre_c_init + 0x3
eip = 0x004015b0 esp = 0x0012fffc ebp = 0xffffffff
trust: scan
* I've not looked at these last few frames in detail.
Assignee | ||
Comment 28•15 years ago
|
||
I'm going to test this patch a little more. It gives us *something* for these crashes @0x0, which is better than the nothing we have now. I want to make sure that it's not going to be worse for other cases, though. Since we have that sampling of 24 hours of minidumps from our production system, I'm going to try running the patched and unpatched stackwalker against a bunch of them, and compare the results.
Assignee: jmuizelaar → ted.mielczarek
Reporter | ||
Comment 29•15 years ago
|
||
The biggest problem keeping us from getting a good stack here is that we don't have proper unwind info. Another problem is that we seem to assume that there is a frame pointer when we do the search of the stack when we have unwind info. Fixing either issue should fix this stack. Getting proper unwind info is the easiest from a correctness standpoint :)
Assignee: ted.mielczarek → jmuizelaar
Reporter | ||
Updated•15 years ago
|
Assignee: jmuizelaar → ted.mielczarek
Assignee | ||
Comment 30•15 years ago
|
||
Unfortunately that's also the part that's hardest to deal with, since we have to work with whatever Visual C++ is producing.
Assignee | ||
Comment 31•15 years ago
|
||
I ran this against a small set of crashes from the minidump collection, and it looks like it never makes things worse. You can see the minidump_stackwalk output (and diffs between old and new) here:
http://people.mozilla.org/~tmielczarek/breakpad-stacks/
This makes sense, since we only really take this code path in a case where we would otherwise just give up and quit walking the stack.
Assignee | ||
Comment 32•15 years ago
|
||
Cleaned up the patch a little and submitted it for review upstream:
http://breakpad.appspot.com/32003
Assignee | ||
Comment 33•15 years ago
|
||
Landed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=409
Filed bug 521231 on getting our production copy updated.
Depends on: 521231
Assignee | ||
Comment 34•15 years ago
|
||
There are also a set of crashes @0x1. Hopefully this patch will help those as well:
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=%400x1
Summary: Some crashes don't get unwound by the minidump processor usefully [@ @0x0] → Some crashes don't get unwound by the minidump processor usefully [@ @0x0] [@ @0x1]
Assignee | ||
Comment 35•15 years ago
|
||
New code is in production, I think we're done here. We can file a new bug if we find some other type of crash that the stack walker does a bad job on.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 36•15 years ago
|
||
Here's a crash with a minimal completely useless stack: bp-2ab122c7-f0f1-457b-ae68-d6ff32090701
Can it be helped with this bug?
Assignee | ||
Comment 37•15 years ago
|
||
It's difficult to say without having access to a specific minidump. Note that the fix for this has been rolled out in production, so crash reports processed after 2009-10-14 (in the evening) will have this fix in effect.
Updated•15 years ago
|
Whiteboard: [crashkill]
Component: General → Socorro
Product: Core → Webtools
QA Contact: general → socorro
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•