Closed Bug 1286802 Opened 8 years ago Closed 7 years ago

Add heap regions of the crash context to minidump (Windows)

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox50 --- affected
firefox56 --- fixed

People

(Reporter: cyu, Assigned: cyu)

References

(Blocks 3 open bugs)

Details

Attachments

(9 files, 4 obsolete files)

8.60 KB, patch
Details | Diff | Splinter Review
195.22 KB, image/png
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
59 bytes, text/x-review-board-request
ted
: review+
Details
Currently the minidump only contains stack memory content, but often the objects that we are interested in is on the heap. It would be useful if we add the heap memory to the dump to help us investigate the crash.

Obviously, dumping the whole heap is unrealistic. Often we are only interested in the object that triggers access violation, like the object that is used after free(), the object that is still alive, but was corrupted by another object, etc. Ideally we'd like the crash dump code to "know" which heap areas are relevant and only include those heap areas to the dump. This is a very difficult, if possible, problem. Instead of solving that difficult problem, we can have an approximation heuristically.

The target is try to lean toward the left in the spectrum of
(Dump only the stack) <---------------------------------> (Dump the whole virtual address space)
but include more information in the dump that will help us debug.

One approach that might work is that when the crash happens, we try to treat each general-purpose CPU register value as a pointer to some object on the heap and expand from that point to an area to include to the minidump. For example:

EAX: 0x14ae00fe ---(treat this as a pointer)---> include the area of 1 page [0x14ae0000.....0x14ae0fff] to the dump.
EBX: 0x3 ---(treat this as a pointer) ---> is not a heap pointer. Skip it.

The rationale behind this is that on the top frame of the crashing thread, some register is used to trigger the access violation. We follow this "link" to include the memory area. There are rooms for improvement for this approach, but I will try this as a starting point and see how much it helps.
Attached patch Proof of concept (obsolete) — Splinter Review
Hi Benjamin,

Do you have any concern about including some heap areas to the minidump to help debugging? Thanks.
Flags: needinfo?(benjamin)
RFC for :ted.
Flags: needinfo?(ted)
This significantly increases the risk that user data will be included in the dump, which has in the past been something we want to avoid. I think we need to be very careful about this. I'm fine with trying this on nightly and devedition, to see how much it helps: but before we get into release, we should apply additional minimization, such as only doing this for crashes where engineers have asked for additional data (similar to winqual). We can probably use data from the client stackwalking system to inform this decision, so I encourage you to reach out to ddurst and gsvelto to discuss.

OTOH, I think you should probably be *more* aggressive about this for pages of JITcode. Either do this dereference lookup for all RX pages, or for all RX pages which are not a DLL mapping.
Flags: needinfo?(benjamin)
On Windows we could try using MiniDumpWithIndirectlyReferencedMemory:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms680519(v=vs.85).aspx
"Include pages with data referenced by locals or other stack memory. This option can increase the size of the minidump file significantly."

For other platforms obviously we'd have to implement support for this in Breakpad.
Flags: needinfo?(ted)
The PoC is not usable as written (which you probably already knew), since RegisterAppMemory allocates memory. The Windows minidump writing code uses a `std::list` to store the list of memory regions, and reserves one entry at the front to put in the memory around the crashing instruction pointer:
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#249
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#998

and then `MinidumpWriteDumpCallback` iterates over that list and includes each entry as a region in the dump:
https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#875

We could extend that to include a fixed number of pre-allocated `AppMemory` objects that we could fill in using the technique from your patch. If we do that we might want to add a `valid` field to the `AppMemory` struct so we only write valid entries to the dump.
Changes:

* Use HeapAlloc()/HeapFree() in AppMemoryList to address the problem that we need to allocate memory during exception handling.
* Reduce the heap area size to include in the minidump.
* Enable/disable this feature using an environment variable for testing.
Attachment #8770911 - Attachment is obsolete: true
I made random samples with some recent crashes on bugzilla and try to reproduce the crash with the help of Visual Studio debugger. I simulate the crash condition by nulling out some object member or vptr in the memory at the line of crash. Because the code has been changed/fixed since the crash, the crash condition is not 100% the same as original.

Here are some test results:

* This approach is effective in access violations in using an object, like dereferencing a freed member of an object, use-after-free (given that the original page is not unmapped), etc.
* This approach is not effective in aborts, assertion failures, etc. if the object in interest is multiple stack frames from the crashing frame. To address this problem, we may try to unwind the stack several frames and try to find the pointers stored on the stack to include more heap areas, but this further increases the risk that we include user data in the dump.
* Tests with MiniDumpWithIndirectlyReferencedMemory are also made, but contrary to expectation, this dump type doesn't include the crashing object as expected.
* The size increase of the dump is about 100KB (599 vs 684KB for example). The size of MiniDumpWithIndirectlyReferencedMemory dump is smaller than the one with the patch (it doesn't significantly increase the dump size as it claims).
This is screenshot of the dump (simulated crash of bug 1261702) loaded in Visual Studio. It can be seen that the dump includes the content of *this* (refcount and other data members) on the top frame.
Attachment #8798328 - Flags: review?(benjamin)
Assignee: nobody → cyu
Comment on attachment 8798328 [details]
Bug 1286802 - Part 1: Enable heap regions in the minidump with opt-in pref or on dev edition.

https://reviewboard.mozilla.org/r/83850/#review83014

::: toolkit/xre/nsAppRunner.cpp:4263
(Diff revision 1)
>      CrashReporter::AnnotateCrashReport(NS_LITERAL_CSTRING("useragent_locale"), userAgentLocale);
>    }
> +
> +#if defined(XP_WIN) || defined(XP_LINUX)
> +    bool includeHeapRegions =
> +#ifndef RELEASE_BUILD

It would be better to use a pref always for this, and set the default value of the pref #ifdef RELEASE_BUILD in all.js/firefox.js

Also, please update the docs at http://gecko.readthedocs.io/en/latest/toolkit/crashreporter/crashreporter/index.html to document exactly what this does and the pref.
Attachment #8798328 - Flags: review?(benjamin) → review-
And note that RELEASE_BUILD was just renamed as RELEASE_OR_BETA (bug 1304829).
Ted, will you have time for the review of part 2 and 3? Thanks.
Flags: needinfo?(ted)
I'm going to review these patches today.

As I'm looking at them I had a thought--Google isn't actively developing the Breakpad client code for Windows or OS X anymore (they're using Crashpad). They do make changes to the Linux client but it's mostly for ChromeOS support. I wonder if we wouldn't be better served just forking the Breakpad client code into mozilla-central instead of working with it from upstream? We could keep using the symbol dumping and stackwalking bits from upstream, since those are still in use at Google and see active development (until I get rust-minidump to a state where we can replace the Breakpad code with it).
Flags: needinfo?(ted)
I filed bug 1336548 on forking the Breakpad client, FWIW.
I think we should just implement this as an option in the Breakpad ExceptionHandler classes, it should be a lot more straightforward there. A lot of the complexity in these patches seems to be from trying to extend the `RegisterAppMemory` API to work from the filter callback. If we just added an `ExceptionHandler::set_include_context_heap(bool)` API, it could handle it with much less fuss. On non-Windows the implementation would be pretty simple. On Windows it'd still be a little bit of a pain because we'd have to do it in the MinidumpWriteDump callback, but it shouldn't be very hard to do.
The client fork has landed, so you can now feel free to rewrite the ExceptionHandler APIs as necessary without worrying about other users.
Comment on attachment 8798329 [details]
Bug 1286802 - Part 2: Add support for registering app memory under exception context.

https://reviewboard.mozilla.org/r/83852/#review112550

I think you should drop this patch and just do the work in Part 3 directly in the ExceptionHandler or MinidumpWriter classes directly now that we've forked them.
Attachment #8798329 - Flags: review?(ted) → review-
Comment on attachment 8798330 [details]
Bug 1286802 - Part 3: Add heap regions to the minidump using the register values in the exception context.

https://reviewboard.mozilla.org/r/83854/#review112548

I like the concept but as I said this will be much simpler to just implement inside the Breakpad client code. The Linux and OS X implementations should be pretty simple--there's already a special case for the crashing thread's instruction pointer memory, you'd just be looking at the other registers as well and adding memory for them. On Windows it'll be a little bit more complicated because we have to add the regions from the `MinidumpWriteDumpCallback`. I can think of a couple of ways around that, but maybe the simplest would be to just preallocate as many `AppMemory` entries as we have potential register values (like how we currently preallocate one for the instruction pointer memory), but add a flag to them so we can mark them as invalid until we check them all before writing the dump. Then the callback could just skip entries in the list that aren't marked valid and you wouldn't have to do any heap allocation. 

Since we've forked the code you can just write it to always add these memory regions which should simplify things quite a bit.

::: toolkit/crashreporter/nsExceptionHandler.cpp:1429
(Diff revision 1)
> +
> +  if (memInfo.Protect == PAGE_READWRITE) {
> +    return true;
> +  }
> +
> +  return true;

Is this supposed to be `return false`?

::: toolkit/crashreporter/nsExceptionHandler.cpp:1484
(Diff revision 1)
> +    lower -= sHeapRegionSize / 2;
> +
> +    char* upper = lower + sHeapRegionSize;
> +    // addr is a heap address. Find out the range to include in the dump.
> +    if (!IsHeapAddress(upper)) {
> +      upper -= reinterpret_cast<RegisterValueType>(upper) % sPageSize;

I think what you ought to do here instead is basically `lower = max(meminfo.BaseAddress, lower)` and `upper = min(meminfo.BaseAddress + meminfo.RegionSize, upper)` since the info you get back from VirtualQuery already tells you the bounds of the memory region. If you move this code into exception_handler.cc you can just reuse the existing code that does that for the memory around the instruction pointer:
https://dxr.mozilla.org/mozilla-central/rev/f505911eb333d5ae8c2bf5c44f7b85add6450b53/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#978

::: toolkit/crashreporter/nsExceptionHandler.cpp:1620
(Diff revision 1)
> +
> +  uint8_t mapped = 0;
> +
> +  // To be precise, we should parse /proc/self/maps, but here we use mincore()
> +  // as an approximation.
> +  if (mincore(reinterpret_cast<void*>(addr), sPageSize, &mapped)) {

Similary, if you move this code into the Linux minidump_writer.cc you can just reuse the code for the instruction pointer, which has already parsed the maps for you:
https://dxr.mozilla.org/mozilla-central/rev/f505911eb333d5ae8c2bf5c44f7b85add6450b53/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc#349
Attachment #8798330 - Flags: review?(ted) → review-
Comment on attachment 8798328 [details]
Bug 1286802 - Part 1: Enable heap regions in the minidump with opt-in pref or on dev edition.

https://reviewboard.mozilla.org/r/83850/#review112560

I don't have anything to add beyond bsmedberg's comments here. Is there a reason you want this to be opt-in? Obviously there are PII concerns with heap memory, but I'm not sure that it materially changes the PII concerns around minidumps.
Attachment #8798328 - Flags: review?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21)
> I don't have anything to add beyond bsmedberg's comments here. Is there a
> reason you want this to be opt-in? Obviously there are PII concerns with
> heap memory, but I'm not sure that it materially changes the PII concerns
> around minidumps.
It's opt-in on beta and release for PII concerns and limiting the impact. If in proves that the PII concerns are not an issue in practice, we can turn this on by default.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #20)
Thanks for the review. I'll take the breakpad-centric approach and update the patch.
Attachment #8798328 - Attachment is obsolete: true
Attachment #8798329 - Attachment is obsolete: true
Attachment #8798330 - Attachment is obsolete: true
Summary: Add some heap areas to the minidump → Add heap regions of the crash context to minidump (Windows)
Blocks: 1351277
Filed bug 1351277 to handle the non-Windows platforms.
Comment on attachment 8851965 [details]
Bug 1286802 - Part 1: Add minidump_callback.{h|cpp} to Windows breakpad client.

https://reviewboard.mozilla.org/r/124218/#review128584
Attachment #8851965 - Flags: review?(ted) → review+
Comment on attachment 8851966 [details]
Bug 1286802 - Part 2: Refactor google_breakpad::ExceptionHandler for including heap regions.

https://reviewboard.mozilla.org/r/124220/#review128586
Attachment #8851966 - Flags: review?(ted) → review+
Comment on attachment 8851967 [details]
Bug 1286802 - Part 3: Add empty set_include_context_heap() implementation to ExceptionHandler and CrashGenerationServer.

https://reviewboard.mozilla.org/r/124222/#review128590

::: modules/libpref/init/all.js:5681
(Diff revision 2)
>  pref("layers.advanced.outline-layers", 2);
> +
> +// When a crash happens, whether to include heap regions of the crash context
> +// in the minidump. Enabled by default on nightly and aurora. 
> +#ifdef RELEASE_OR_BETA
> +pref("toolkit.crashreporter.includeContextHeap", false);

I'm not sure we have a totally consistent naming convention for preferences, but I think we more often use underscores instead of camelCase, so maybe `crashreporter.include_context_heap`? (The toolkit bit also seems extraneous, although we do have another pref that starts with that...)

::: toolkit/crashreporter/nsExceptionHandler.cpp:256
(Diff revision 2)
>  // Avoid a race during application termination.
>  static Mutex* dumpSafetyLock;
>  static bool isSafeToDump = false;
>  
> +// Whether to include heap regions of the crash context.
> +static bool sIncludeContextHeap = false;

I wonder whether we ought to default this to `true` so that startup crashes that happen before we get a profile always get context heap data?
Attachment #8851967 - Flags: review?(ted) → review+
Comment on attachment 8851968 [details]
Bug 1286802 - Part 4: Implemention of including heap regions from the crashing context.

https://reviewboard.mozilla.org/r/124224/#review128594

::: commit-message-5729c:8
(Diff revision 2)
> +This patch is the core part of including heap regions using the registers of
> +the crashing context. It works as follows:
> +
> +When set_include_context_heap() is called, ExceptionHandler or CrashGenerationServer
> +preallocates a number of AppMemory instances specificfor heap regions of the crash
> +context.

This part of the commit message isn't actually true, is it? This code isn't actually hooked up until the next patch in the series.

::: toolkit/crashreporter/breakpad-client/windows/common/minidump_callback.cc:52
(Diff revision 2)
> +      !(memInfo.Protect == PAGE_EXECUTE_READ && memInfo.Type == MEM_PRIVATE)) {
> +    return false;
> +  }
> +
> +  // Try to include a region of size sHeapRegionSize around aRegister, bounded
> +  // by the [BaseAddress, BaseAddress + RegionSize].

Can we share some of this code with the existing code that does this for the instruction pointer?

https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#977

Alternately, I guess you could just remove the special handling for the instruction pointer since you already have eip/rip in the list below. We would have to have some other special handling to continue to add that memory to the dump even if this "add additional heap memory" pref is disabled, I don't know if that'd make things better or worse.

::: toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:242
(Diff revision 2)
>  
>    // Reserve one element for the instruction memory
>    AppMemory instruction_memory;
>    instruction_memory.ptr = NULL;
>    instruction_memory.length = 0;
> +  instruction_memory.preallocated = false;

It seems a little weird to call this reserved memory block *not* preallocated, but I guess that ties into my other comment about reusing the same machinery for the instruction pointer + other register values.
Attachment #8851968 - Flags: review?(ted) → review+
Comment on attachment 8851969 [details]
Bug 1286802 - Part 5: Implement ExceptionHandler::set_include_context_heap() for inprocess crash generation.

https://reviewboard.mozilla.org/r/124226/#review128640

One small thing that needs to be fixed, but otherwise looks good.

::: toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:966
(Diff revision 2)
>        MinidumpCallbackContext context;
>        context.iter = app_memory_info_.begin();
>        context.end = app_memory_info_.end();
>  
> +      if (include_context_heap_) {
> +        IncludeAppMemoryFromExceptionContext(::GetCurrentProcess(),

You want to use `process` here, which is one of the parameters to `WriteMinidumpWithExceptionForProcess`.

::: toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:1040
(Diff revision 2)
>      app_memory_info_.erase(iter);
>    }
>  }
>  
>  void ExceptionHandler::set_include_context_heap(bool enabled) {
> +  if (enabled && !include_context_heap_) {

It's possible (although not very likely) for someone to call `set_include_context_heap(true); set_include_context_heap(false); set_include_context_heap(true);`, which would result in this code allocating extra preallocated app memory entries.
Attachment #8851969 - Flags: review?(ted) → review+
Comment on attachment 8851970 [details]
Bug 1286802 - Part 6: Implement CrashGenerationServer::set_include_context_heap() for OOP crash generation.

https://reviewboard.mozilla.org/r/124228/#review128642

This needs a little fixup, but it's overally pretty close.

::: toolkit/crashreporter/breakpad-client/windows/crash_generation/client_info.cc:153
(Diff revision 2)
>  
> +bool ClientInfo::GetClientContextRecord(EXCEPTION_POINTERS* ex_info,
> +                                        EXCEPTION_POINTERS* out_ex_info,
> +                                        CONTEXT* out_context) const {
> +  SIZE_T bytes_count = 0;
> +  if (!ReadProcessMemory(process_handle_,

Can you just call `GetClientExceptionInfo` here instead of effectively duplicating it?

Actually, I'm not really sure why you're doing this this way--the code that calls this already calls `GetClientContextRecord`, so you should just be able to use the `EXCEPTION_POINTERS*` it already has, right? It doesn't look like the code you added to call `GetClientContextRecord` actually uses the `out_ex_info` value.

::: toolkit/crashreporter/breakpad-client/windows/crash_generation/crash_generation_server.cc:902
(Diff revision 2)
>    SetEvent(client_info.dump_generated_handle());
>  }
>  
>  void CrashGenerationServer::set_include_context_heap(bool enabled) {
> +  if (enabled && !include_context_heap_) {
> +    // Preallocate AppMemory instances for exception context.

You don't really need to do this here for the out-of-process case, since it's safe to allocate memory at the time of dump writing (since this isn't the crashed process).

::: toolkit/crashreporter/breakpad-client/windows/crash_generation/crash_generation_server.cc:932
(Diff revision 2)
>    DWORD client_thread_id = 0;
>    if (!client.GetClientThreadId(&client_thread_id)) {
>      return false;
>    }
>  
> +  AppMemory app_memory_list[kExceptionAppMemoryRegions];

You're reserving space for `kExceptionAppMemoryRegions` here, but you're not actually using it--you're using `app_memory_info_` for those regions. You should either change this to not reserve the space, or change the code to use `app_memory_list` for the memory regions from the exception context.
Attachment #8851970 - Flags: review?(ted) → review-
Thanks for getting this done, I think it's going to be really useful! One thing I only just thought of after doing all those reviews--I didn't see anything to stop the code from adding the same memory region to the list multiple times (like if several registers all point into the same page). I'm not sure that it'd actually be harmful, but it'd be good to test and make sure that MinidumpWriteDump handles that gracefully, at least. It would be nice to not wind up with redundant copies of memory in the minidump or something silly like that.
Comment on attachment 8851967 [details]
Bug 1286802 - Part 3: Add empty set_include_context_heap() implementation to ExceptionHandler and CrashGenerationServer.

https://reviewboard.mozilla.org/r/124222/#review128590

> I wonder whether we ought to default this to `true` so that startup crashes that happen before we get a profile always get context heap data?

This requires that we preallocate the AppMemory instances and maybe destroy them if it is later set to false. The updated part 5 has checks in ExceptionHandler::set_include_context_heap() that avoids allocating more than enough AppMemory instances if called multiple times. We can leverage that.
Comment on attachment 8851968 [details]
Bug 1286802 - Part 4: Implemention of including heap regions from the crashing context.

https://reviewboard.mozilla.org/r/124224/#review128594

> Can we share some of this code with the existing code that does this for the instruction pointer?
> 
> https://dxr.mozilla.org/mozilla-central/rev/38894655c89e68bcd8f45d31a0d3005f2c2b53db/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#977
> 
> Alternately, I guess you could just remove the special handling for the instruction pointer since you already have eip/rip in the list below. We would have to have some other special handling to continue to add that memory to the dump even if this "add additional heap memory" pref is disabled, I don't know if that'd make things better or worse.

This section of code can be removed merged into IncludeAppMemoryFromExceptionContext().
Comment on attachment 8851969 [details]
Bug 1286802 - Part 5: Implement ExceptionHandler::set_include_context_heap() for inprocess crash generation.

https://reviewboard.mozilla.org/r/124226/#review128640

> It's possible (although not very likely) for someone to call `set_include_context_heap(true); set_include_context_heap(false); set_include_context_heap(true);`, which would result in this code allocating extra preallocated app memory entries.

I merged the original logic including app memory around the instruction pointer. This is also updated accordingly to only preallocate if necessary after counting the preallocated AppMemory instance in the list.
Comment on attachment 8851970 [details]
Bug 1286802 - Part 6: Implement CrashGenerationServer::set_include_context_heap() for OOP crash generation.

https://reviewboard.mozilla.org/r/124228/#review128642

> Can you just call `GetClientExceptionInfo` here instead of effectively duplicating it?
> 
> Actually, I'm not really sure why you're doing this this way--the code that calls this already calls `GetClientContextRecord`, so you should just be able to use the `EXCEPTION_POINTERS*` it already has, right? It doesn't look like the code you added to call `GetClientContextRecord` actually uses the `out_ex_info` value.

GetClientExceptionInfo only reads in a pointer from the client process, i.e. it reads sizeof(void*) pointing to the EXCEPTION_POINTERS structure in the client process. We need more than that, that is the whole EXCEPTION_POINTERS structure and then the content of the CONTEXT structure. The updated version of this patch the function is renamed to PopulateClientExceptionContext() to make this intent clear.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #41)
> Thanks for getting this done, I think it's going to be really useful! One
> thing I only just thought of after doing all those reviews--I didn't see
> anything to stop the code from adding the same memory region to the list
> multiple times (like if several registers all point into the same page). I'm
> not sure that it'd actually be harmful, but it'd be good to test and make
> sure that MinidumpWriteDump handles that gracefully, at least. It would be
> nice to not wind up with redundant copies of memory in the minidump or
> something silly like that.

In my local test on Windows, if we RegisterAppMemory() multiple times, only one MDMemoryDescriptor entry is shown in the output of minidump_dump. I think MinidumpWriteDump handles duplication of memory regions well.
Comment on attachment 8851970 [details]
Bug 1286802 - Part 6: Implement CrashGenerationServer::set_include_context_heap() for OOP crash generation.

https://reviewboard.mozilla.org/r/124228/#review140210

That looks better, thanks!
Attachment #8851970 - Flags: review?(ted) → review+
Sorry about the delay on that final review, the patch escaped my notice.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #46)
> In my local test on Windows, if we RegisterAppMemory() multiple times, only
> one MDMemoryDescriptor entry is shown in the output of minidump_dump. I
> think MinidumpWriteDump handles duplication of memory regions well.

Update: we have xpcshell test checking that instructions around RIP/EIP is included in the crash dump. The test always fails on Windows 7 with the current patch but succeeds on Windows 10. It seems that Windows 10 is tolerant about overlapping memory regions and Windows 7 is not. Test case test_crashreporter_crash.js almost always fails because some register values point to the stack space of the crashing thread, and breakpad's minidump reader code rejects parsing when seeing overlapping regions.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d824806d0fcbc04fcd9472fe187fed88f7cf291 handles pointers to stack and overlapping regions. It passes the tests locally. Let's see if it passes on try.
Ted, can I have your review for the added part 7, especially for the non-trivial code for querying thread stack information? Thanks.
Flags: needinfo?(ted)
review ping.
Comment on attachment 8868977 [details]
Bug 1286802 - Part 7: Don't generate overlapping app memory regions in the crash dump.

https://reviewboard.mozilla.org/r/140650/#review150846

Thanks, this looks good, just a couple of small comments. It would be nice to write a more thorough test for this if you haven't already (it's been a while since I read the entire patch stack). Something where we heap-allocate a some data in one of the crashing functions would probably work fine, but you can land this without having that in place.

::: toolkit/crashreporter/breakpad-client/windows/common/minidump_callback.cc:124
(Diff revision 1)
> +    return false;
> +  }
> +
> +  using FuncPtr = decltype(::NtQueryInformationThread);
> +  FuncPtr* pNtQueryInformationThread =
> +    (FuncPtr*)(::GetProcAddress(::GetModuleHandleW(L"ntdll.dll"),

Can you run this in a setup function somewhere before we crash? I would guess this isn't actually doing anything dangerous, since ntdll.dll is always loaded when the process starts, but I get worried about doing too much work in a crashed process.

::: toolkit/crashreporter/breakpad-client/windows/common/minidump_callback.cc:222
(Diff revision 1)
> +                                  heapAddrCandidates[i],
> +                                  &tmp)) {
> +      continue;
>      }
> +
> +    assert(tmp.ptr && tmp.length);

I would prefer this was not an assert--better to get a minidump without this data than to fail writing the minidump because something unexpected happen.
Attachment #8868977 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
Blocks: 1374245
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6a393dbc7a5521d28d7b7f75c9d1ce6ab57be39
Bug 1286802 - Part 1: Add minidump_callback.{h|cpp} to Windows breakpad client. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/00497fb060654537373df881fee403e5f4445c4e
Bug 1286802 - Part 2: Refactor google_breakpad::ExceptionHandler for including heap regions. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/4f2fce5430f3f1d3996b4b522124af35f50d7946
Bug 1286802 - Part 3: Add empty set_include_context_heap() implementation to ExceptionHandler and CrashGenerationServer. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/e76c0ff35226451fbbb9cd66d02a1d5c6264986c
Bug 1286802 - Part 4: Implemention of including heap regions from the crashing context. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/5c263108f9346908c0c91d31512b725699510d15
Bug 1286802 - Part 5: Implement ExceptionHandler::set_include_context_heap() for inprocess crash generation. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/bc854a17fee60d770e2ffef859ed9f59d9cbfa2d
Bug 1286802 - Part 6: Implement CrashGenerationServer::set_include_context_heap() for OOP crash generation. r=ted

https://hg.mozilla.org/integration/mozilla-inbound/rev/a015b6e3a99657332085de2a92dbc3577a9e0542
Bug 1286802 - Part 7: Don't generate overlapping app memory regions in the crash dump. r=ted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: