Closed Bug 516759 Opened 11 years ago Closed 10 years ago

Electrolysis: crash reporting for multiple processes

Categories

(Toolkit :: Crash Reporting, defect)

x86
Windows NT
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- .4-fixed

People

(Reporter: benjamin, Assigned: cjones)

References

Details

(Whiteboard: [fixed-lorentz])

Attachments

(3 files, 4 obsolete files)

We've got multiple processes, and it would be really nice to hook up crash reporting for them.
Good student-project, Ted can mentor
Keywords: student-project
Basically bug 423745, but Breakpad now has support for this on OS X and Linux as well.
No longer blocks: 525849
I'm going to interpret this bug as covering the low-level breakpad and toolkit/crashreporter changes needed to support OOP crash reporting.
Assignee: nobody → jones.chris.g
Blocks: 531142
No longer blocks: 525850
Worth noting: bug 423745 is orthogonal to this one.
Attached patch breakpad API changes (obsolete) — Splinter Review
I'd like to get a quick review of the proposed API.  The API was designed to be as similar to the Windows crash_generation/ code as practicable/stomach-able.

Quick overview of scheme:
  - the chrome process will create a socketpair() in toolkit/crashreporter code, and then create a CrashGenerationServer to listen to the chrome end of the socketpair
  - the CrashGenerationServer will spawn a pthread to poll() the server end of the socketpair and the server thread's "control pipe"
  - just before forking a child process, chrome code will dup() the client end of the socketpair, and just after forking, remap the dup'd fd to some well-known fd (42 maybe)
  - the child process will create an ExceptionHandler as usual, except that it will pass in the client socketpair fd to the ctor, and this will magically have the ExceptionHandler create and use a CrashGenerationClient instead of the usual code paths
  - when the child crashes or requests a minidump, it will do a little dance on its end of the socketpair and notify the chrome process.  i'll steal this code from chromium, as they have some nice features implemented

Worth noting: the linux CrashGenerationServer notification API is subtly different from the windows API
  - child processes implicitly "connect" to the server when they're forked, so there's no "onconnect" notification
  - we have a separate code path for child process exit notification/cleanup, so there's an "onexiting" notification rather than "onexited".  We might need to make this change in the windows code as well, or at least make it configurable
Attachment #417565 - Flags: review?(ted.mielczarek)
I should add that this API will probably need revising when we have more than one level in the process hierarchy, but we should cross that bridge when we come to it.
The crash_generation/ API was (mostly) shamelessly stolen from the windows implementation, and the meat of the client/server communication was shamelessly stolen from chromium code.

I don't think this patch is quite ready to pushed upstream; I'm not sure how custom child minidump fields should be handled by the parent.  I have some ideas though.
Attachment #417565 - Attachment is obsolete: true
Attachment #417885 - Flags: review?(ted.mielczarek)
Attachment #417565 - Flags: review?(ted.mielczarek)
Notes:
 * the new OOP API is explicitly *not* #ifdef MOZ_IPC, because the OOP code only runs when MOZ_IPC-enabled code asks it to
 * I couldn't test this code on windows because the e10s windows build is broken.  I don't expect to have to make any nontrivial changes to the windows code, which is why I'm posting this for review now.
Attachment #417887 - Flags: review?(ted.mielczarek)
This bug is missing two other parts

  Part 4: Unit tests for OOP crash reporting
  Part 5: Proper handling of child crash reports in the parent

I can do Part 4 easily enough; I think we'll want to use the e10s TestShell infrastructure.

I don't feel very comfortable with Part 5 though.  Ted, do you want to take that?
Unfortunately the backout from bug 535071 completely trashed the breakpad patch.  Not sure how these dependencies should work out.
Just keep working as if that patch was there, if you can. I'm going to fix and reland it ASAP.
Duplicate of this bug: 423745
Attachment #417885 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 417885 [details] [diff] [review]
Implement out-of-process minidump generation for linux

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/Makefile.in b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/Makefile.in
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/Makefile.in
>+XPI_NAME 	= crashreporter

I don't think XPI_NAME is actually necessary here.

>+ifeq ($(OS_ARCH),Linux)
>+include $(topsrcdir)/config/config.mk
>+# need this to suppress errors when compiling common/linux/eintr_wrapper.h
>+OS_CXXFLAGS := $(filter-out -pedantic,$(OS_CXXFLAGS))
>+endif

Ew, no. Let's fix that eintr_wrapper code to compile with -pedantic. It's the typeof() it doesn't like, right? There's some other name you can use in GCC that doesn't fail with -pedantic.

>\ No newline at end of file

Add a newline here.

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/client_info.h b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/client_info.h
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/client_info.h
>+#endif // ifndef CLIENT_LINUX_CRASH_GENERATION_CLIENT_INFO_H_

Just // CLIENT_LINUX_CRASH_GENERATION_CLIENT_INFO_H_

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.cc
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.cc
>+#include <algorithm>
>+#include <sys/socket.h>
>+#include <sys/types.h>
>+
>+#include "client/linux/crash_generation/crash_generation_client.h"
>+
>+#include "common/linux/eintr_wrapper.h"
>+#include "common/linux/linux_libc_support.h"
>+#include "common/linux/linux_syscall_support.h"

I believe the preferred ordering of #includes in Breakpad code is:
1) libc stuff
2) libc++ stuff
3) local includes

with one blank line in between each group, and each group ordered alphabetically internally.

>+CrashGenerationClient::CrashGenerationClient(int server_fd) :
>+  server_fd_(server_fd)
>+{
>+}
>+
>+CrashGenerationClient::~CrashGenerationClient()
>+{
>+}

Just put these inline in the class definition if they're going to have empty bodies.

>+
>+bool
>+CrashGenerationClient::RequestDump(const void* blob, size_t blob_size)
>+{
>+  int fds[2];
>+  socketpair(AF_UNIX, SOCK_STREAM, 0, fds);

You probably want sys_socketpair here.

>+
>+  static const unsigned kControlMsgSize = CMSG_SPACE(sizeof(int));
>+
>+  struct kernel_msghdr msg;
>+  my_memset(&msg, 0, sizeof(struct kernel_msghdr));
>+  struct kernel_iovec iov[1];
>+  iov[0].iov_base = const_cast<void*>(blob);
>+  iov[0].iov_len = blob_size;
>+
>+  msg.msg_iov = iov;
>+  msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]);
>+  char cmsg[kControlMsgSize];
>+  memset(cmsg, 0, kControlMsgSize);

my_memset.

>+  msg.msg_control = cmsg;
>+  msg.msg_controllen = sizeof(cmsg);
>+
>+  struct cmsghdr *hdr = CMSG_FIRSTHDR(&msg);

Be consistent in how you place the *, looks like previously you have it snuggled with the type name (I prefer that style anyway).

>+  char b;
>+  HANDLE_EINTR(sys_read(fds[0], &b, 1));

Probably want a comment here indicating that it's waiting for a server response.

>+/*static*/ CrashGenerationClient*

// static
CrashGenerationClient*

>+  // XXX stricter check of |server_fd|
>+  if (0 > server_fd)
>+    return NULL;

I think you should either implement a stricter check or drop the comment.

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.h b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.h
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_client.h
>+  // disable
>+  CrashGenerationClient(const CrashGenerationClient&);
>+  CrashGenerationClient& operator=(const CrashGenerationClient&);

Add a few more words to your comment here.

>+#endif // ifndef CLIENT_LINUX_CRASH_GENERATION_CRASH_GENERATION_CLIENT_H_

#endif // CLIENT_LINUX...

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc
>+#include "client/linux/crash_generation/crash_generation_server.h"
>+#include "client/linux/crash_generation/client_info.h"
>+#include "client/linux/handler/exception_handler.h"
>+#include "client/linux/minidump_writer/minidump_writer.h"
>+
>+#include "common/linux/eintr_wrapper.h"
>+#include "common/linux/guid_creator.h"

Just ditch the extra blank line here and order them all alphabetically.

>+static bool
>+FileDescriptorGetInode(ino_t* inode_out, int fd)

Google styleguide sez function names should be verbs, so this should probably read "GetInodeForFileDescriptor" or something like that.

>+// Parse a symlink in /proc/pid/fd/$x and return the inode number of the
>+// socket.
>+//   inode_out: (output) set to the inode number on success
>+//   path: e.g. /proc/1234/fd/5 (must be a UNIX domain socket descriptor)
>+//   log: if true, log messages about failure details

You don't have a log argument.

>+static bool
>+ProcPathGetInode(ino_t* inode_out, const char* path)

I'd rename this function as well.

>+  if (memcmp(kSocketLinkPrefix, buf, sizeof(kSocketLinkPrefix) - 1)) {

I'd prefer an explicit != 0 here.

>+  char *endptr;

Be consistent with your * placement.

>+  const unsigned long long int inode_ul =

Ick, can we use u_int64_t or something here? Google style doesn't like you to use anything other than "int", preferring to use precise sizes in other cases.
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/google_breakpad/common/breakpad_types.h

>+static bool
>+FindProcessHoldingSocket(pid_t* pid_out, ino_t socket_inode)

>+  while ((dent = readdir(proc))) {
>+    char *endptr;

char*

>+    const unsigned long int pid_ul = strtoul(dent->d_name, &endptr, 10);
>+    if (pid_ul == ULONG_MAX || *endptr)

*endptr != '\0'

>+      ino_t fd_inode;
>+      if (ProcPathGetInode(&fd_inode, buf)) {
>+        if (fd_inode == socket_inode) {

I'd probably just write that with && instead of nested ifs.

>+          if (already_found) {
>+            closedir(fd);
>+            return false;
>+          }

What's the point of this block? You break the loop if you find the inode anyway.

>+namespace google_breakpad {

The above static functions could go inside the namespace, I don't think it makes any difference.

>+
>+CrashGenerationServer::CrashGenerationServer(
>+  const int listen_fd,
>+  OnClientDumpRequestCallback dump_callback,
>+  void* dump_context,
>+  OnClientExitingCallback exit_callback,
>+  void* exit_context,
>+  bool generate_dumps,
>+  const std::string* dump_path) : server_fd_(listen_fd),
>+                                  dump_callback_(dump_callback),
>+                                  dump_context_(dump_context),
>+                                  exit_callback_(exit_callback),
>+                                  exit_context_(exit_context),
>+                                  generate_dumps_(generate_dumps),
>+                                  started_(false)

Google style wants the entire initializer list to start on the next line if it doesn't all fit on one line:
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Constructor_Initializer_Lists

>+CrashGenerationServer::~CrashGenerationServer()
>+{
>+  if (!started_)
>+    return;
>+
>+  Stop();
>+}

I think I would write this as if (started_) Stop(); unless you expect to wind up doing more work here.

>+
>+bool
>+CrashGenerationServer::Start()
>+{
>+  if (started_ || 0 > server_fd_)
>+    return false;
>+
>+  int control_pipe[2];
>+  // XXX use pipe2() where available
>+  if (pipe(control_pipe))
>+    return false;

Any particular reason to care about fixing this? You can easily add a configure check to Mozilla, and just ignore it for other situations, but maybe it's not worth it. Also, I really do prefer explicit == 0, although that's not in the Google style guide.

>+  // XXX creating joinable; should this be configurable?
>+  if (pthread_create(&thread_, NULL,
>+                     ThreadMain, reinterpret_cast<void*>(this)))

Fix it or don't, but don't leave the comment in.

>+/*static*/ bool
>+CrashGenerationServer::CreateReportChannel(int* server_fd, int* client_fd)

// static
bool

>+  static const int on = 1;

No reason this needs to be static, right?

>+void
>+CrashGenerationServer::Run()
>+{

>+    if (-1 == nevents)
>+      if (EINTR == errno)
>+	continue;
>+      else
>+	return;

It looks like you've got some literal tab characters in here. You should untabify. Also, if you have nested ifs, you should brace them.

>+bool
>+CrashGenerationServer::ClientEvent(short revents)

>+  // Walk the control payload an extract the file descriptor and validated pid.

"and"

>+  // Kernel bug workaround (broken in 2.6.30 at least):

Yuck. Did you copy this from Chromium code? It's pretty awful, but I guess there's not much you can do about that.

>+/*static*/ void*
>+CrashGenerationServer::ThreadMain(void *arg)

// static
void*
CrashGenerationServer::ThreadMain(void* arg)

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.h b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.h
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.h

>+#include <pthread.h>
>+#include <string>

Add a blank line between these.

>+class CrashGenerationServer {
>+public:
>+  // !!!WARNING!!!: callbacks may be invoked on a different thread
>+  // !!!than that which creates the CrashGenerationServer.  They must
>+  // !!!be thread safe.

I don't know that you need quite so many exclamation points here.


>+  // Create an instance with the given parameters.
>+  //

Your param list doesn't seem to match reality. Clean that up?

>+private:
>+  void Run();
>+  // true == "keep running", false == "exit"
>+  bool ClientEvent(short revents);
>+  bool ControlEvent(short revents);
>+  bool MakeMinidumpFilename(std::string& outFilename);
>+
>+  static void* ThreadMain(void* arg);

You should add brief descriptive comments to each of these.

>+#endif // ifndef CLIENT_LINUX_CRASH_GENERATION_CRASH_GENERATION_SERVER_H_

#endif // CLIENT_LINUX...

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc
>--- a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc

>+ExceptionHandler::ExceptionHandler(const std::string& dump_path,
>+                                   FilterCallback filter,
>+                                   MinidumpCallback callback,
>+                                   void* callback_context,
>+                                   bool install_handler,
>+                                   const int server_fd) :
>+  filter_(filter),
>+  callback_(callback),
>+  callback_context_(callback_context),
>+  handler_installed_(install_handler)

Keep the colon where it was. Also, be consistent with the existing style about the & in reference types.

>+void ExceptionHandler::Init(const std::string &dump_path,
>+                            const int server_fd)
>+{
>+  crash_handler_ = NULL;
>+
>+  if (0 <= server_fd)
>+    crash_generation_client_.reset(CrashGenerationClient::TryCreate(server_fd));

This line goes over 80 characters. You should massage it to fit.

That's a lot of comments, but they're all just nit-picking, so I'm giving you r+.
> >--- /dev/null
> >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc
> >+          if (already_found) {
> >+            closedir(fd);
> >+            return false;
> >+          }
> 
> What's the point of this block? You break the loop if you find the inode
> anyway.
> 

The control flow here is a little funky, but this block can be reached if this function is passed an fd that's opened by more than one process.  This appears to be an sanity/error/security check.

> >+  static const int on = 1;
>
> No reason this needs to be static, right?
>

This is probably done to ensure that the lifetime of the option's value is at least as long as that of the socket, so that it can be read after the stack frame goes away.  The man page says

Optname and any specified options  are  passed  uninterpreted  to  the
       appropriate  protocol  module  for  interpretation.

TBH, I know if it's always necessary (and kinda doubt it), but I feel queasy about changing it.

> >+  // Kernel bug workaround (broken in 2.6.30 at least):
>
> Yuck. Did you copy this from Chromium code? It's pretty awful, but I guess
> there's not much you can do about that.

Yes.

Would you like to take another look at the nits-picked patch?
Depends on: 538642
Fixed two typos, nothing major.
Attachment #417887 - Attachment is obsolete: true
Attachment #417887 - Flags: review?(ted.mielczarek)
Comment on attachment 420801 [details] [diff] [review]
Part 3: Have Gecko utilize OOP crash reporting (updated)

Oops, still want review though ;).
Attachment #420801 - Flags: review?(ted.mielczarek)
Comment on attachment 417885 [details] [diff] [review]
Implement out-of-process minidump generation for linux

>diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/crash_generation/crash_generation_server.cc
>+bool
>+CrashGenerationServer::ClientEvent(short revents)
>+{
[snip]
>+  // Send the done signal to the process: it can exit now.
>+  memset(&msg, 0, sizeof(msg));
>+  struct iovec done_iov;
>+  done_iov.iov_base = const_cast<char*>("\x42");
>+  done_iov.iov_len = 1;
>+  msg.msg_iov = &done_iov;
>+  msg.msg_iovlen = 1;
>+
>+  HANDLE_EINTR(sendmsg(signal_fd, &msg, MSG_DONTWAIT | MSG_NOSIGNAL));
>+  HANDLE_EINTR(close(signal_fd));
>+
>+  if (dump_callback_) {
>+    ClientInfo info;
>+
>+    info.crash_server_ = this;
>+    info.pid_ = crashing_pid;
>+
>+    dump_callback_(dump_context_, &info, &minidump_filename);
>+  }
>+

In the version I commit, the order of these two events will be reversed.  For Mozilla, we want to invoke the dump callback while the client is still blocked to avoid racing with the "process crashed" notification.  More generally, this brings this code to parity with the Windows crash_generation_server, and is more useful anyway because then the dump callback can query information about the crashing child (e.g., look up info in /proc/[child-pid]).
Comment on attachment 417885 [details] [diff] [review]
Implement out-of-process minidump generation for linux

Can you file a followup or something on figuring out a way to make this compile with -pedantic? That filter bit in the makefile still bothers me. Alternately, you could make a more compelling argument than I have in bug 394311. :)
Comment on attachment 417886 [details] [diff] [review]
Part 2: Refactor nsExceptionHandler in preparation for OOP crash handling

I don't get the point of this patch. Are you planning on writing child dumps to a temp directory always? That doesn't seem like the right decision. The in-process dumper defaults to $TEMP, but once we get a profile directory we write to $profile/minidumps. (See SetMinidumpPath in this file.) If you need to get that path, you can just grab it from gExceptionHandler's dump_path() method:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.h#173
Attachment #417886 - Flags: review?(ted.mielczarek) → review-
Blocks: 539048
Comment on attachment 420801 [details] [diff] [review]
Part 3: Have Gecko utilize OOP crash reporting (updated)

>diff --git a/ipc/app/MozillaRuntimeMain.cpp b/ipc/app/MozillaRuntimeMain.cpp
>--- a/ipc/app/MozillaRuntimeMain.cpp
>+++ b/ipc/app/MozillaRuntimeMain.cpp
>@@ -49,16 +49,28 @@
> #include <windows.h>
> // we want a wmain entry point
> #include "nsWindowsWMain.cpp"
> #endif
> 
> int
> main(int argc, char* argv[])
> {
>+#if defined(MOZ_CRASHREPORTER)
>+#  if defined(XP_WIN)
>+    if (argc < 2)
>+        return 1;
>+    if (!XRE_SetRemoteExceptionHandler(argv[--argc]))
>+        return 1;
>+#  else
>+    if (!XRE_SetRemoteExceptionHandler())
>+        return 1;
>+#  endif   
>+#endif // if defined(MOZ_CRASHREPORTER)

Is indenting the preprocessor directives the normative style nowadays? (I honestly don't know.)

Also, I feel like perhaps we should #error on unsupported platforms, but I'm unsure. We should at least have that in an #elif, not a #else, so that we don't run this code on OS X or whatever if someone is building --enable-ipc.

>diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
>--- a/ipc/glue/GeckoChildProcessHost.cpp
>+++ b/ipc/glue/GeckoChildProcessHost.cpp
>@@ -41,16 +41,21 @@
> #include "base/command_line.h"
> #include "base/path_service.h"
> #include "base/string_util.h"
> #include "chrome/common/chrome_switches.h"
> #include "chrome/common/process_watcher.h"
> 
> #include "prprf.h"
> 
>+#if defined(OS_LINUX)
>+#  define XP_LINUX 1
>+#endif
>+#include "nsExceptionHandler.h"

Where does OS_LINUX come from? I kind of wish we had that (or XP_LINUX) defined from configure.


>+#if defined(MOZ_CRASHREPORTER)
>+  cmdLine.AppendLooseValue(
>+    UTF8ToWide(CrashReporter::GetChildNotificationPipe().BeginReading()));

Does this line push past 80 characters?

>diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp
>--- a/toolkit/crashreporter/nsExceptionHandler.cpp
>+++ b/toolkit/crashreporter/nsExceptionHandler.cpp
>+// OOP crash reporting
>+static CrashGenerationServer* sCrashServer; // chrome process has this
>+
>+#if defined(XP_WIN)
>+static nsCString* sChildCrashNotifyPipe;
>+
>+#elif defined(XP_LINUX)
>+static int sServerSocketFd = -1;
>+static int sClientSocketFd = -1;
>+static const int kMagicChildCrashReportFd = 42;
>+#endif

Ditch the leading 's' in the names, it's not the style in this file, anyway.

>+// Out-of-process crash reporting API wrappers
>+static void
>+OnChildProcessDumpRequested(void* aContext,
>+                            const ClientInfo* aClientInfo,
>+#if defined(XP_WIN)
>+                            const std::wstring*
>+#else
>+                            const std::string*
>+#endif
>+                              aFilePath)

Is aFilePath required to be a pointer by the Breakpad classes you're using, or could you make it a const reference instead?

>+static bool
>+OOPInitialized()
>+{
>+  return !!sCrashServer;

File style (my style) is foo != NULL.

>+static void
>+OOPInit()
>+{
>+  NS_ABORT_IF_FALSE(!OOPInitialized(), "multiple init()");

Seems a bit unnecessary to abort here, I think. As discussed on IRC, aborting the parent process is probably not a great thing to ship.

>+
>+#if defined(XP_WIN)
>+  // this is a CString to make it more convenient to pass on the
>+  // command line
>+  sChildCrashNotifyPipe = 
>+    new nsCString("\\\\.\\pipe\\gecko-crash-server-pipe.");
>+  long pid = static_cast<long>(::GetCurrentProcessId());
>+  sChildCrashNotifyPipe->AppendInt(pid);
>+
>+  std::wstring dumpDir(GetMinidumpTempDir());

cjones assures me that he's updating this patch to not use the temp dir.

>diff --git a/toolkit/xre/nsEmbedFunctions.cpp b/toolkit/xre/nsEmbedFunctions.cpp
>--- a/toolkit/xre/nsEmbedFunctions.cpp
>+++ b/toolkit/xre/nsEmbedFunctions.cpp
>+#if defined(MOZ_CRASHREPORTER)
>+PRBool
>+XRE_SetRemoteExceptionHandler(const char* aPipe)

Maybe const char* aPipe/* = 0 */) would be nice as a reminder.

Overall looks good, I'd probably like to take another pass when you update the patch.
Attachment #420801 - Flags: review?(ted.mielczarek) → review+
(In reply to comment #19)
> (From update of attachment 417885 [details] [diff] [review])
> Can you file a followup or something on figuring out a way to make this compile
> with -pedantic? That filter bit in the makefile still bothers me. Alternately,
> you could make a more compelling argument than I have in bug 394311. :)

I'm generally in favor of -pedantic, but this isn't the only place we filter it ... also in ipc/chromium (IIRC also for HANDLE_EINTR()).  Would you be satisfied with me noting both of these sites in bug 394311?
Yes, that's fine.
(In reply to comment #21)
> (From update of attachment 420801 [details] [diff] [review])
> >diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
> >--- a/ipc/glue/GeckoChildProcessHost.cpp
> >+++ b/ipc/glue/GeckoChildProcessHost.cpp
> >@@ -41,16 +41,21 @@
> > #include "base/command_line.h"
> > #include "base/path_service.h"
> > #include "base/string_util.h"
> > #include "chrome/common/chrome_switches.h"
> > #include "chrome/common/process_watcher.h"
> > 
> > #include "prprf.h"
> > 
> >+#if defined(OS_LINUX)
> >+#  define XP_LINUX 1
> >+#endif
> >+#include "nsExceptionHandler.h"
> 
> Where does OS_LINUX come from? I kind of wish we had that (or XP_LINUX) defined
> from configure.
> 

We get that from the chromium config.  configure doesn't define XP_LINUX, and we work around that elsewhere (in crashreporter too).  OK to fix this in a followup bug?

> >+// Out-of-process crash reporting API wrappers
> >+static void
> >+OnChildProcessDumpRequested(void* aContext,
> >+                            const ClientInfo* aClientInfo,
> >+#if defined(XP_WIN)
> >+                            const std::wstring*
> >+#else
> >+                            const std::string*
> >+#endif
> >+                              aFilePath)
> 
> Is aFilePath required to be a pointer by the Breakpad classes you're using, or
> could you make it a const reference instead?
> 

Yes, this is the Windows interface, and I wanted to be consistent with that for Linux.

> >+static bool
> >+OOPInitialized()
> >+{
> >+  return !!sCrashServer;
> 
> File style (my style) is foo != NULL.
> 
> >+static void
> >+OOPInit()
> >+{
> >+  NS_ABORT_IF_FALSE(!OOPInitialized(), "multiple init()");
> 
> Seems a bit unnecessary to abort here, I think. As discussed on IRC, aborting
> the parent process is probably not a great thing to ship.
> 

Well, this is DEBUG-only abort.  I thought we had discussed RUNTIMEABORT() on IRC, which also aborts in OPT builds.
Removed dependency on GetMinidumpTmpDir() and added some hackery to make OOP crash reporting respect in-process enabledness.
Attachment #417886 - Attachment is obsolete: true
Attachment #420801 - Attachment is obsolete: true
Attachment #421139 - Flags: review?(ted.mielczarek)
(In reply to comment #24)
> We get that from the chromium config.  configure doesn't define XP_LINUX, and
> we work around that elsewhere (in crashreporter too).  OK to fix this in a
> followup bug?

Yeah, that's fine.

> Well, this is DEBUG-only abort.  I thought we had discussed RUNTIMEABORT() on
> IRC, which also aborts in OPT builds.

You're right. Debug-only aborts are fine.
Comment on attachment 421139 [details] [diff] [review]
Part 2 v2 (was parts 2 and 3 previously): address review comments and respect crashreporter enabled-ness

>diff --git a/ipc/app/MozillaRuntimeMain.cpp b/ipc/app/MozillaRuntimeMain.cpp
>--- a/ipc/app/MozillaRuntimeMain.cpp
>+++ b/ipc/app/MozillaRuntimeMain.cpp
>+    if (strcmp("-", crashReporterArg)

I prefer strcmp() == 0

>+#  else
>+#    error Unsupported

Can you make that error message a little more descriptive?

>diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp
>--- a/toolkit/crashreporter/nsExceptionHandler.cpp
>+++ b/toolkit/crashreporter/nsExceptionHandler.cpp
>+static void
>+OOPInit()
>+{
>+  NS_ABORT_IF_FALSE(!OOPInitialized(), "multiple init()");
>+  NS_ABORT_IF_FALSE(!!gExceptionHandler, "in-process crashreporter uninit");

Make these slightly more verbose, please.

>diff --git a/toolkit/xre/nsEmbedFunctions.cpp b/toolkit/xre/nsEmbedFunctions.cpp
>--- a/toolkit/xre/nsEmbedFunctions.cpp
>+++ b/toolkit/xre/nsEmbedFunctions.cpp
>+#else
>+#  error Unsupported

This too.

r=me with those nits fixed.
Attachment #421139 - Flags: review?(ted.mielczarek) → review+
Keywords: student-project
This fixes the tinderbox leak of one nsStringBuffer. I'll file a followup about creating and calling a new OOPShutdown method from UnsetExceptionHandler to actually clean this up.
Attachment #421448 - Flags: review?(ted.mielczarek)
Comment on attachment 421448 [details] [diff] [review]
Fix nsStringBuffer leakstats, rev. 1

r=me if you file that bug first, and include the bug number in that "intentionally leaked" comment.
Attachment #421448 - Flags: review?(ted.mielczarek) → review+
Blocks: 539451
Depends on: 539290
Blanket approval for Lorentz merge to mozilla-1.9.2
a=beltzner for 1.9.2.4 - please make sure to mark status1.9.2:.4-fixed
You need to log in before you can comment on or make changes to this bug.