Open Bug 587729 Opened 14 years ago Updated 1 month ago

Do all minidump generation out of process

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: ted, Unassigned)

References

(Blocks 5 open bugs)

Details

Attachments

(1 file, 2 obsolete files)

Currently when the browser process crashes, we write the minidump in the crashed process. This can lead to malformed minidumps when the address space of the process has been badly trashed. When plugin processes crash, we write the minidump from the browser process, whose memory space is intact, so it's much less likely that the dump will be malformed.

Given that we have these OOP dump generation APIs now, we should be able to restructure the browser process exception handling so that we can write browser process minidumps from another process. I think that the browser process, upon crashing, should exec the crash reporter client (like we do currently), but pass some info to it (on a pipe or whatever), and the crash reporter client can use the Breakpad APIs to write a minidump of its parent process, which would then exit. This has the benefit that even if the browser process's memory is trashed, as long as it can exec the crash reporter we should be able to get a good dump out of it.
Blocks: 527095
I have a prototype implementation working on Linux:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-oop-test

I'm going to get this small sample working on Linux/Mac/Win and then look at integrating it into our codebase.
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED
Blocks: 634726
Blocks: 610551
I still get such crashes in firefox, mostly when I'm not at the computer. It'd be nice to get a grip on them. But more importantly, it'd be nice to get a grip on bug 610551
Blocks: 675956, 711568
(In reply to Ted Mielczarek [:ted] from comment #1)
> I have a prototype implementation working on Linux:
> http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-oop-test
> 
> I'm going to get this small sample working on Linux/Mac/Win and then look at
> integrating it into our codebase.

Hi Ted. Any news of where this might be in the pipeline?  Is it contingent on some other bug?
or is Bug 724046 the near term objective?
Neither of them are on my plate currently. I'd love to get something to fix this situation, but it's tricky and I don't really have time at the moment.
See Also: → 696637
No longer blocks: 711568
Blocks: 711568
This is an initial implementation of the "exception handler" process for generating minidump for the chrome process (only on Linux).

It reuses much of the current machinery of crash reporting for child processes: CrashGenerationServer/CrashGenerationClient for OOP dump generation and CrashReporterHost/CrashReporterClient for sending crash annotation and finalizing the report.

The added classes are mostly under toolkit/crashreporter/ipc within the new namespace CrashReporer::ipc.
* A new IPC protocol PExceptionHandler and its concrete actors ExceptionHandlerParent/ExceptionHandlerChild.
* ExceptionHandlerProcessHost, a subclass of mozilla::ipc::GeckoChildProcessHost for launching the exceptionhandler process from the chrome process.
* ExceptionHandlerProcess, a subclass of mozilla::ipc::ProcessChild, for bootstraping the exceptionhandler process.

The chrome process starts a new type of child process: exceptionhandler, which runs CrashGenerationServer and CrashReporterHost. The chrome process acts as a client for crash reporting.

This is how the exceptionhandler process works:

(Chrome process)
* After the chrome process starts (where we crash-annotate "StartupCrash" as 0), if the pref value of "toolkit.crashreporter.exception_handler_process.enabled" is true, we call CrashReporter::EnableExceptionHandlerProcess().
* In CrashReporter::EnableExceptionHandlerProcess(), we create an instance of ExceptionHandlerProcessHost, set up the command line argument, and launch the exceptionhandler process.
* ExceptionHandlerProcessHost::Launch() also opens the child ExceptionHandlerChild actor to establish an IPC channel to the exceptionhandler process.

(exceptionhandler process)
* In XRE_InitChildProcess(), we create an instance of ExceptionHandlerProcess to initialize the process:
** XPCOM, directory service, local file, etc. that are used by the crash reporter.
** Inprocess exception handler for the exceptionhandler process itself and CrashGenrationServer for the chrome process
** Open the ExceptionHandlerParent IPC actor and send the crash notify pipe file descriptor through the IPC channel

(Chrome process)
* Initialize CrashReporterClient and send current crash annotations through the CrashReporterClient instance
* Shut down the in-process exception handler and enable CrashGenerationServer with the received crash notify pipe FD
(Now the exceptionhandler process is ready to handle crash of the chrome process)

When the chrome process crashes:
* The exceptionhandler process generates the dump using CrashGenerationServer and calls OnChildProcessDumpRequested() after the dump file is generated. This part is the same as what the chrome process does for content process crashes.
* The exceptionhandler process creates the crash report using CrashReporterHost::AdoptMinidump() and CrashReporterHost::FinalizeCrashReport().
* After minidump and the .extra file are ready, launch the crashreporter UI and terminates itself.

Some other design decisions that should be revisited:
* On normal shutdown, the chrome process doesn't shut down the exceptionhandler process but only disconnects the IPC channel. The exceptionhandler process will shut itself down after the IPC channel is closed.
* The exceptionhandler process uses a delayed shutdown.
* The exceptionhandler process is reusing Module::ALLOW_IN_GPU_PROCESS XPCOM module selector.

One big issue is that it might be problematic for a child process to read the memory of its parent process. Ubuntu prohibits a process to be ptrace()'d except for its parent process, but Fedora or Debian doesn't look to have this restriction. Windows doesn't look like to be a problem (Google Chrome's crashpad-handler process is also launched from the parent process). The bottom line for Windows is that we send the privileged process handle through the IPC channel. I haven't check this issue on Mac OSX yet, but Mac may have a similar restriction as Ubuntu does.
Another design option is that the first process created by the executable is the exceptionhandler process and make the chrome process its child. This has the advantage that we only need to start the crash client inside the chrome process. This also makes it easier to let the exceptionhandler process handle the crashes of all processes.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> One big issue is that it might be problematic for a child process to read
> the memory of its parent process. Ubuntu prohibits a process to be
> ptrace()'d except for its parent process, but Fedora or Debian doesn't look
> to have this restriction. Windows doesn't look like to be a problem (Google
> Chrome's crashpad-handler process is also launched from the parent process).
> The bottom line for Windows is that we send the privileged process handle
> through the IPC channel. I haven't check this issue on Mac OSX yet, but Mac
> may have a similar restriction as Ubuntu does.

macOS should have the same restriction as Ubuntu and I expect more Linux distros to have it in the future. I have the same restriction on my Gentoo installation and didn't set it by hand so I guess it's the default.
You can work around those, though. We actually already do that for the Linux case--the ExceptionHandler code calls `clone()` and ptraces its parent to write the dump, so the parent process calls `prctl(PR_SET_TRACER,...)`:
https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/crashreporter/breakpad-client/linux/handler/exception_handler.cc#551

For macOS, you just need access to the Mach task port. It's a pain to get that without explicit cooperation from the process involved, but since we control both processes it's not hard, you can just send it over any other Mach port:
https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/crashreporter/breakpad-client/mac/crash_generation/crash_generation_client.cc#47

Breakpad has a MachIPC file full of helper functions for this stuff.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> You can work around those, though. We actually already do that for the Linux
> case--the ExceptionHandler code calls `clone()` and ptraces its parent to
> write the dump, so the parent process calls `prctl(PR_SET_TRACER,...)`:
> https://dxr.mozilla.org/mozilla-central/rev/
> f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/crashreporter/breakpad-
> client/linux/handler/exception_handler.cc#551
> 
> For macOS, you just need access to the Mach task port. It's a pain to get
> that without explicit cooperation from the process involved, but since we
> control both processes it's not hard, you can just send it over any other
> Mach port:
> https://dxr.mozilla.org/mozilla-central/rev/
> f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/crashreporter/breakpad-
> client/mac/crash_generation/crash_generation_client.cc#47
> 
> Breakpad has a MachIPC file full of helper functions for this stuff.

That's what I need. Thanks for the information. Do you see any show stopper for the current design? I'd like to proceed in this direction.
If you end up needing to forward exceptions using Mach (e.g. in the exception handler process itself) note that there's some almost-reusable code at [1]. Breakpad doesn't need to though IIRC, so I don't know if it will come up.

[1] https://dxr.mozilla.org/mozilla-central/source/js/src/ds/MemoryProtectionExceptionHandler.cpp
Updates: make Windows and MacOSX work: crashreporter UI process launched from the exception handler process correctly.

Lots of pending TODOs needs to be addressed.
Attachment #8908495 - Attachment is obsolete: true
Attached patch WIP V3Splinter Review
I still need to make RegisterAppMemory() work OOP (bug 1408473).
Attachment #8912184 - Attachment is obsolete: true
Assignee: ted → cyu
Status: ASSIGNED → NEW
Blocks: 1360392
Blocks: 1245570
Assignee: cervantes.yu → nobody
Severity: normal → S3
See Also: → 1799225
Duplicate of this bug: 942873
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: