Closed Bug 599301 Opened 9 years ago Closed 9 years ago
Make Breakpad include memory around instruction pointer in minidumps on older versions of Windows
The URL contains a crash in JITed code. According to Microsoft's documentation (and my testing), minidumps are supposed to include a memory region around the instruction pointer of the crashing thread. However, Firefox crashes in our JITed code don't seem to include this. The crash in the URL contains a memory region around the instruction pointer from the thread that wrote the minidump, so it's just the memory around kernel32!KiFastSystemCallRet, which is not at all helpful. I haven't been able to reproduce this behavior in my test program, so I'm having a hard time reducing the problem to something that I can fix. I'll upload a copy of a Breakpad tree with my test program somewhere if someone else wants to fiddle with it.
beltzner requested that this be blocking-beta8+ in bug 536271. Without some help figuring out what's going wrong, I don't know how to fix it.
blocking2.0: ? → beta8+
sounds like the dump api doesn't think it's a code region and isn't including it. there's probably an attribute to the memory region that's missing :)
ted, are you still looking for help on this? sounds like it also blocks diagnosis of the #5 trunk topcrash (bug 595351) which actually would be really good to fix for b7, or at least use b7 crash feedback to understand that bug better.
Yeah, I still have no idea why this happens, and I'm not actively looking into it. (I hit a wall, and I have no idea how to proceed.)
Perhaps the JS guys can give you some hints?
What have you tried?
You can see the unittests I wrote here: (you should be able to apply this patch to a Breakpad SVN checkout and build it on Windows if you so desire) http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-mq/file/f0318dfc809f/win32-ip-memory#l134 It allocates some executable memory the same way as our JITs do, writes some bad instructions into it, and calls the memory like a function. All of the tests pass when I run them, showing that the minidump contains the memory around the faulting IP.
Timeless, does bug 536271 (blocker to this bug) address your bug Bug 518811 comment 1? (startup crash [@ strchr | XPT_DoCString]) If so, then does bug 536271 address not having enough information to debug issues of failures in js code (something I mentioned to Laura a few months ago). Or does it only address diagnosing certain limited aspects of JIT?
comment 2 rephrased (for Ted also) ... does bug 536271 address a) determining much better how JIT is failing in cases like bug Bug 518811, AND b) does it any way also help identify how user js code is failing?
bug 518811 doesn't seem to have anything to do with the JIT. The patches on bug 536271 will allow better debugging of JIT crashes from minidumps because the JITted instructions being executed will be present in the minidump.
Ted, do have any advice on building this on Windows? I checked out the SVN repo using mozillabuild, then ran the gyp script to generate the MSVC project files, then opened src\client\windows\breakpad_client.sln, opened that in MSVC2005, then tried to build. I got these errors: 1>.\exception_handler_test.cc(245) : error C2065: 'MemoryInfoListStream' : undeclared identifier 1>.\exception_handler_test.cc(247) : error C2065: 'ThreadInfoListStream' : undeclared identifier 1>.\exception_handler_test.cc(249) : error C2065: 'HandleOperationListStream' : undeclared identifier 1>.\exception_handler_test.cc(251) : error C2065: 'TokenStream' : undeclared identifier 1>minidump_test.cc 1>.\minidump_test.cc(198) : error C2065: 'MemoryInfoListStream' : undeclared identifier 1>.\minidump_test.cc(199) : error C2065: 'ThreadInfoListStream' : undeclared identifier 1>.\minidump_test.cc(200) : error C2065: 'HandleOperationListStream' : undeclared identifier 1>.\minidump_test.cc(201) : error C2065: 'TokenStream' : undeclared identifier
You might need one of the other patches in my queue or something. I zipped up my srcdir with the patches applied: http://people.mozilla.com/~tmielczarek/google-breakpad-windows.zip I just successfully built that and ran the tests with VC 2008.
Ted, I built Breakpad minus the tests and linked it to my own tiny project where I do a test case much like yours. And it worked for me too--I opened the minidumps and got memory on both sides of the crash address. I think next I'll try a browser build with breakpad and inject a crash into the JM jitcode, and see what that does.
shaver suggested (and I attempted to check, but was foiled by my WinDBG being 64-bit and not playing very nice with 32-bit processes) comparing the attributes of the memory segments allocated in our test programs vs. in the JIT. I did look at the VirtualAlloc calls and noted that they were basically identical to my test, so I didn't pursue that further earlier.
(In reply to comment #14) > shaver suggested (and I attempted to check, but was foiled by my WinDBG being > 64-bit and not playing very nice with 32-bit processes) comparing the > attributes of the memory segments allocated in our test programs vs. in the > JIT. I did look at the VirtualAlloc calls and noted that they were basically > identical to my test, so I didn't pursue that further earlier. Yes, and I tried it both ways: with r/w/x all permitted, and with only execute permitted. It worked fine both ways.
Needs an owner: is that you dmandelin? or Ted with help?
blocking2.0: beta8+ → beta7+
I've been working on this when I have time, but mostly I have been occupied with other beta7 blockers.
As I said in comment 1, I have no idea how to proceed here.
The plan I had was to delta debug between two confs: 1. Works correctly: build a tiny test program that generates fake jitcode into a buffer that is allocated the same way as JM buffers and crashes in that code. 2. Doesn't seem to work: crashdumps coming in from nightlies. My first step was going to be to build a browser with breakpad, then instrument JM to generate code that will crash, and see what happens. That might be debuggable somehow. Another possibility would be that that actually works--then we'd have to find out why that doesn't seem to be happening in the crashreports.
OK, I just did the test with an injected crash into JM code into a tinderbox build with breakpad enabled, and...it worked. I got about 176 bytes around EIP. I will revisit some jitcrashes from the field when I get time. One idea I have is that earlier I might have looked at a crash where we jumped into invalid memory, in which case of course we would not be able to include that memory in the minidump.
I just tried the minidump for this: https://crash-stats.mozilla.com/report/index/197f5919-6d1a-4b8e-825b-569082101028 and it does contain the code memory. Not sure what happened with the minidump that started this bug, but apparently it is not a problem in general. And I think Ted's tests and mine show that breakpad is functioning as it should.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
I just had a breakthrough here. dvander was poking around at minidumps for bug 595351, and he casually noted that the dumps he was downloading from Windows XP systems didn't have the JIT code memory available. However, all the Windows 7 dumps I looked at did have it. In a head-slapping "duh" moment, I re-ran the unit tests from comment 7, but with dbghelp.dll from a Windows Server 2003 machine copied next to the test binary. In that situation, the tests all fail! Looks like older versions of dbghelp.dll don't do this correctly, so probably all our dumps from WinXP won't have this memory available. I think we can fix Breakpad to work around this.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee: nobody → ted.mielczarek
Status: REOPENED → ASSIGNED
Crap. Actually, it looks like we can't. I'll do some testing, but the docs say that the callback function to include extra memory didn't get added until DbgHelp version 6.1. I'll find out if that's true tomorrow.
Testing shows that it seems to work anyway with the DbgHelp.dll from Win2k3. I guess version numbers of Windows-bundled DbgHelp bear no relation to the version numbers of independently-released DbgHelp. Crazy.
I've got a Breakpad patch up for review: http://breakpad.appspot.com/259001 This fixes the issue on Windows XP, at least according to my unit tests.
OS: All → Windows XP
Summary: Figure out why Windows minidumps don't include memory around instruction pointer when crashing in JITed code → Make Breakpad include memory around instruction pointer in minidumps on older versions of Windows
This landed upstream: http://code.google.com/p/google-breakpad/source/detail?r=763 I'll get a mozilla-central patch together shortly.
This patch was reviewed and landed upstream, I added an xpcshell test to verify the behavior. This built and passed tests on my Linux and Win7 machines, but I pushed it to try just for sanity's sake. Assuming this is green on try, I'd like to try to land it tomorrow and get it into b11, since it will give our JS devs a lot more useful data in minidumps from Windows XP.
Comment on attachment 507727 [details] [diff] [review] Make Breakpad include memory around instruction pointer in minidumps on older versions of Windows. a=beltzner, with the condition that you watch crash-stats very carefully after it lands in nightlies to ensure that it's doing what we expect!
Attachment #507727 - Flags: approval2.0? → approval2.0+
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/846f2ea91e63
Status: ASSIGNED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b11
I checked three dumps from the 20110130030342 nightly on Windows XP, and they all contained the JIT code memory: https://crash-stats.mozilla.com/report/index/a7c52f10-2e34-47c0-a6bf-4d7672110131 https://crash-stats.mozilla.com/report/index/ef6988e5-033a-41fa-addf-551592110130 https://crash-stats.mozilla.com/report/index/63bf959e-d20f-4245-9a4e-bce1b2110130 Just for sanity, I also checked a Windows 7 dump and it also still had the memory: https://crash-stats.mozilla.com/report/index/a85fa334-d212-469b-8371-b9b2a2110130
You need to log in before you can comment on or make changes to this bug.