Closed Bug 536271 Opened 15 years ago Closed 14 years ago

Include the page containing EIP in the minidump

Categories

(Toolkit :: Crash Reporting, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta7+

People

(Reporter: dmandelin, Assigned: ted)

References

Details

Attachments

(3 files)

(Spun off from bug 530955 comment 22.)

In order to debug on-trace (jitted code) crashes, we need the code at the point of the crash. As a first step, we'd like the page that contains EIP (at the time of the crash) included in the minidump. This is only needed for jitted code; if breakpad can tell whether the crash EIP is in jit code (looking at segments or something?), then it doesn't need to do this if the crash is in normal code. 

If it's cheap enough, getting an extra page on either side is probably a good idea too.
This will require some upstream changes. For the OS X / Linux cases it will probably actually be easier, since we're writing the minidump out manually. On Windows it will be a little trickier since we're using MinidumpWriteDump.
Carrying over blocking nomination from the dupe. Interestingly, the Breakpad minidump format documentation claims that a normal minidump already contains the 256 bytes around eip:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/google_breakpad/common/minidump_format.h#272

I'll see if I can verify that.
blocking2.0: --- → ?
blocking2.0: ? → beta7+
From minidump examination, it looks like this is the case on Windows. I'm not sure that it's present on OS X/Linux. If you can point me at a crash report in JIT coded where this would be useful, I can attempt to verify it.

If this is WFM on Windows, would you still want to block on implementing it for OS X/Linux. (On Windows you can obviously pull the data out via a debugger, on OS X/Linux it's not as easy, but you could get it via the minidump_dump tool.)
(In reply to comment #4)
> From minidump examination, it looks like this is the case on Windows.

What does that mean, exactly? Does it show up in the disassembly view of the debugger? I thought I had tried a minidump and couldn't get code memory.
Yeah, I just opened a random minidump I had laying around in WinDBG, and I was able to open the disassembly view and see the 256 bytes around %eip there. (The surrounding memory was all ??s.)
Dave: did you mean this to be beta7 or betaN? Upstream changes make me nervous about time, and this isn't a recent regression ...
Assignee: nobody → ted.mielczarek
This is a pretty big deal for effectively diagnosing JIT-related crashes.  I'm not sure what the upstream risk is, but we can control what we ship whether it's integrated in the google repository or not, I think.
blocking2.0: beta8+ → beta7+
I tested a random Win32 minidump I had laying around, and it does in fact include 256 bytes around EIP. However, dmandelin pointed me at a crash:
https://crash-stats.mozilla.com/report/index/8974e974-9664-4843-a4dc-9badd2100910

This dump contains 256 bytes around EIP, but it's the EIP from the top of the stack where the dump was actually written, not the one from the exception record, which means it's a disassembly of the memory around KiFastSystemCallRet. Not very useful.

It could be that the JIT throws off Microsoft's heuristic here. Perhaps it only attempts to include that memory if it's inside a loaded module? I've found some documentation on how to include extra memory regions on Windows, so I can try forcing it to be included.
So I started looking into patching this. As a first step, I started writing a unit test, of course. Attached is the test body. I ran this as part of the Google Windows client unittests, and it produced a minidump. Before writing the code to programmatically check the minidump memory contents, I opened it in WinDBG, typed ".ecxr" to get the exception record loaded, and opened the memory view, only to see the memory the test program had written in plain sight.

It definitely looks to me like Windows is trying to do this correctly. It must be something that our JIT(s?) does differently from this test program that causes it not to work.

dmandelin: can you take a look at the code snippet here and see what would differ from NanoJIT/(whatever JM uses)? Perhaps memory protection flags?
Here's the minidump it produced, if you'd like to see for yourself.
WTF and NanoJIT's allocators look identical to the one in my test, AFAICT:
http://mxr.mozilla.org/mozilla-central/source/js/src/assembler/jit/ExecutableAllocatorWin.cpp#44
http://mxr.mozilla.org/mozilla-central/source/js/src/nanojit/avmplus.cpp#106

is there some other difference I'm missing here?
I really can't produce a patch here without some help figuring out why crashes in our JITted code produce different results than my testcase, since it's currently impossible for me to test and verify a fix.
(I am, however, working on Mac and Linux fixes.)
I have working patches for Linux and Mac. The Linux patch is up for upstream review: (should be quick, and there's nobody at Moz that knows this code better anyway)
http://breakpad.appspot.com/194001/show

I'll post the Mac patch in a little bit.

I'm still stumped by the Windows behavior. Without being able to reproduce the current failure mode in a test, I don't have high confidence in being able to deliver a fix.
Linux patch reviewed and landed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=693

Mac patch up for review:
http://breakpad.appspot.com/200001/show

Once that gets reviewed I should be able to land them on mozilla-central. Any thoughts about the Win32 case are appreciated. I'm willing to try any theories you might entertain.
While I'm at it, I wrote a little tool that will dump the memory around the instruction pointer from the exception record for a minidump:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/get-minidump-instructions/

There's a pre-built 64-bit Linux binary here:
http://people.mozilla.com/~tmielczarek/get-minidump-instructions
Shaver/Sayrer: please see comment 15; looks like we're set for Windows and Linux, but need help to resolve for Windows which I take to be important given comment 8
Can we get the Linux and OSX parts landed, please, and file a follow-up marked as beta8+ for Windows?
Whiteboard: [patches in hand, follow up for Windows]
I didn't get upstream review on the OS X patch yet, I'll review-ping.

I'll file a Windows follow-up, but without assistance I don't know how to fix it. The Breakpad code just calls a Microsoft library to write minidumps, which appears to do the right thing in my testing. It clearly doesn't do the right thing for all of our crashes, but I'm at a loss to explain why that is.
Upstream OS X patch has review, I'll clean it up and land them both on m-c tomorrow.
Whiteboard: [patches in hand, follow up for Windows] → [patches in hand, follow up for Windows][ETA: 09/23]
Ok, landed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=699

I'm getting a m-c patch together now.
Comment on attachment 477973 [details] [diff] [review]
Include the page containing EIP in the minidump on Linux and Mac.

This is the combined patch merged to m-c. It builds and works on my local system, I'm pushing it to try (along with my other b7 blocker) to make sure it doesn't break things.
Attachment #477973 - Flags: review+
I pushed this to try and it looks good. I don't have time to watch the tree right now, but if someone else wanted to land it that'd be fine. (I'm still waiting on try results for the patch on bug 598507 and was going to push them both.)
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/8b84f9ba869c

Will file the Win32 followup shortly.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [patches in hand, follow up for Windows][ETA: 09/23]
Blocks: 599301
Depends on: 615534
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: