Closed Bug 948614 Opened 12 years ago Closed 6 years ago

Stackwalking of crashes at unknown addresses should be improved

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1310314

People

(Reporter: benjamin, Unassigned)

References

Details

(Whiteboard: [crashkill:P1])

bp-ba024ef5-ddbd-47ec-af0d-d791d2131203 In this crash, we're crashing at 0x0 and the stack reported is: 0x0 __RtlUserThreadStart _RtlUserThreadStart When I use dumplookup, I see something more likely: 0x0 NPSWF32_11_9_900_117.dll!F96146851____________________________ [F_317808310____________________________________________________________ : 207 + 0x0] BaseThreadInitThunk __RtlUserThreadStart _RtlUserThreadStart According to JSON MDSW, we're going from 0x0 to _RtlUserThreadStart by looking at the frame pointer. In this case at least, it's clear that we can do better by assuming that we just CALLed 0x0 and that the return address should be at *$ESP. $EBP is going to at best be the frame pointer of the previous frame and so we'll skip a frame, but it may be several frames back. I'm not sure how much we can generalize this. We could try for all crashes that are not in a known module to "pretend" that the crash is a call-to-bad-address and, if *$ESP is a reasonable return address, use that instead of using a frame pointer or falling back to scanning. That would certainly help cases similar to this one where we're almost certainly calling bogus addresses and immediately crashing, e.g. https://crash-stats.mozilla.com/report/index/1ef03d60-b441-4e02-b94c-0375b2131203 But I'm not sure whether it would harm other cases. Ted, thoughts and are you interested in taking this?
Flags: needinfo?(ted)
The last time we thought about this we invented stack scanning in bug 519616. I apparently filed bug 522701 back then because we still had some leftover crappiness. I don't have a problem with trying something smarter here in lieu of scanning, if we use the "ReturnAddressSeemsValid" heuristic to test *$ESP and fall back to scanning if that fails then it's likely to only produce better results.
Flags: needinfo?(ted)
I can probably find time to fix this. Is dmajor interested in dipping a toe into the Breakpad code?
It would probably be good if he learned his way around some!
Flags: needinfo?(dmajor)
Does this problem live in code that I could step through locally? My favorite source editor is my debugger. If I would have to deal with some remote server then it sounds less attractive :)
Flags: needinfo?(dmajor)
Yes, you can grab and build the Breakpad source from SVN to build minidump_stackwalk: http://code.google.com/p/google-breakpad/source/checkout However, it uses an autoconf build system, so you have to build it on a POSIX system (Cygwin works, but you're still stuck with GDB). It probably wouldn't be terribly hard to whip up a MSVC project to build minidump_stackwalk, but I can't guarantee that all the code actually builds with MSVC. I fixed a lot of it so we could use it in unit tests, but not all of it. It wants to build all the stuff in the static libs here: http://code.google.com/p/google-breakpad/source/browse/trunk/Makefile.am#118 and link it like so: http://code.google.com/p/google-breakpad/source/browse/trunk/Makefile.am#1022
Whiteboard: [crashkill:P1]
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.