Last Comment Bug 581341 - Remote CrashReporter::AnnotateCrashReport from child processes
: Remote CrashReporter::AnnotateCrashReport from child processes
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: IPC (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla10
Assigned To: Josh Matthews [:jdm]
:
Mentors:
Depends on: 605832 668385 691424
Blocks: 636081 639725 641601 616134 684500 693467
  Show dependency treegraph
 
Reported: 2010-07-23 06:27 PDT by Ted Mielczarek [:ted.mielczarek]
Modified: 2011-10-11 13:18 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
8+


Attachments
Make crash report annotation work OOP and subsume existing workarounds. (42.53 KB, patch)
2011-03-25 12:22 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Make crash report annotation work OOP and subsume existing workarounds. (50.53 KB, patch)
2011-03-28 09:28 PDT, Josh Matthews [:jdm]
cjones.bugs: review-
ted: review-
Details | Diff | Splinter Review
Make crash report annotation work OOP and subsume existing workarounds. (64.18 KB, patch)
2011-06-01 08:29 PDT, Josh Matthews [:jdm]
ted: review+
Details | Diff | Splinter Review
Part 4: Disallow re-entry of IPC xpcshell command evaluation. (5.26 KB, patch)
2011-06-14 16:11 PDT, Josh Matthews [:jdm]
bent.mozilla: review-
Details | Diff | Splinter Review
Part 3: Disable alternate signal handlers if crash reporter is enabled. (1.42 KB, patch)
2011-06-14 16:17 PDT, Josh Matthews [:jdm]
benjamin: review-
Details | Diff | Splinter Review
Part 1: Make crash report annotation work OOP and subsume existing workarounds. (49.56 KB, patch)
2011-06-15 06:30 PDT, Josh Matthews [:jdm]
cjones.bugs: review+
Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. (6.58 KB, patch)
2011-06-15 06:31 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 5: Always run the IPC testshell callback, regardless of execution success or failure. (4.49 KB, patch)
2011-06-15 06:31 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 6: Test. (7.07 KB, patch)
2011-06-15 06:33 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. (5.68 KB, patch)
2011-06-15 06:37 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 5: Always run the IPC testshell callback, regardless of execution success or failure. (4.49 KB, patch)
2011-06-15 12:56 PDT, Josh Matthews [:jdm]
bent.mozilla: review-
Details | Diff | Splinter Review
Part 1: Make crash report annotation work OOP and subsume existing workarounds. (51.05 KB, patch)
2011-06-23 16:30 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. (5.68 KB, patch)
2011-06-23 16:30 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 3: Install child process exception handler before activating breakpad. (2.43 KB, patch)
2011-06-23 16:31 PDT, Josh Matthews [:jdm]
benjamin: review+
Details | Diff | Splinter Review
Part 5: Always run the IPC testshell callback, regardless of execution success or failure. (4.86 KB, patch)
2011-06-23 16:32 PDT, Josh Matthews [:jdm]
bent.mozilla: review+
Details | Diff | Splinter Review
Part 6: Test. (7.27 KB, patch)
2011-06-23 16:33 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 1: Make crash report annotation work OOP and subsume existing workarounds. (54.40 KB, patch)
2011-07-08 07:56 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. (6.75 KB, patch)
2011-07-08 07:59 PDT, Josh Matthews [:jdm]
ted: review+
Details | Diff | Splinter Review
Hack around an MSVC PGO bug. (1.16 KB, patch)
2011-09-01 11:12 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 1: Make crash report annotation work OOP and subsume existing workarounds. (54.04 KB, patch)
2011-09-13 11:47 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. * * (7.21 KB, patch)
2011-09-13 11:47 PDT, Josh Matthews [:jdm]
ted: review+
Details | Diff | Splinter Review
Part 3: Install child process exception handler before activating breakpad. (2.33 KB, patch)
2011-09-13 11:48 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 5: Always run the IPC testshell callback, regardless of execution success or failure. (4.99 KB, patch)
2011-09-13 11:49 PDT, Josh Matthews [:jdm]
bent.mozilla: review+
Details | Diff | Splinter Review
Part 6: Test. (7.24 KB, patch)
2011-09-13 11:49 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Hack around an MSVC PGO bug. (1.73 KB, patch)
2011-09-13 11:50 PDT, Josh Matthews [:jdm]
ted: review+
Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. * * (8.44 KB, patch)
2011-09-27 09:05 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Part 2: Hook up CrashReporter functions to IPC. * * (8.49 KB, patch)
2011-09-27 09:12 PDT, Josh Matthews [:jdm]
ted: review+
Details | Diff | Splinter Review

Description Ted Mielczarek [:ted.mielczarek] 2010-07-23 06:27:57 PDT
In bug 578952 I noted that CrashReporter::AnnotateCrashReport won't work from child processes. This is something that we'll probably want to support, especially for content processes. I don't know that any of the other crash reporting APIs would be useful in child processes, except maybe AppendAppNotesToCrashReport.

http://mxr.mozilla.org/mozilla-central/source/xpcom/system/nsICrashReporter.idl
Comment 1 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-02-17 11:18:32 PST
With the PCrashReporter framework, this should be pretty easy to add, and generally across all subprocess types.  I definitely wouldn't recommend blocking fennec-2.0 on this, but I would strongly recommend taking it if someone gets time to hack on it.

If we get a patch in the ff4 timeframe, it's going to need to be written so as not to disturb PPluginModule.  Post-ff4, the two should share most code.
Comment 2 Ted Mielczarek [:ted.mielczarek] 2011-02-17 11:23:34 PST
Yeah, that code from bug 605832 should do most of the heavy lifting here.
Comment 3 Josh Matthews [:jdm] 2011-03-09 13:23:17 PST
I've got ideas for how to implement this.
Comment 4 Josh Matthews [:jdm] 2011-03-25 12:22:25 PDT
Created attachment 521901 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

This is what I'm aiming for right now, but it needs some more testing before I ask for review. Let me know if I'm overreaching, since I decided to subsume the existing PluginModule workaround?
Comment 5 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-03-25 12:28:35 PDT
Please do subsume PluginModule; the concerns in comment 1 are made obsolete by having shipped ff4.
Comment 6 Josh Matthews [:jdm] 2011-03-28 09:28:58 PDT
Created attachment 522378 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

Alright, I think it's game time for this patch.  I ended up touching the testshell because I discovered that I could create remote content crash xpcshell tests without too much effort.  I've tried out crashing Fennec proper, and the graphics annotations show up just fine.  I also fixed up bug 619227 while I was in the area, since I really didn't like having to find the environnment variable to set when I wanted the crash reporter to appear.
Comment 7 Josh Matthews [:jdm] 2011-03-28 09:52:03 PDT
This will need another revision to fix some build problems with MOZ_CRASHREPORTER turned off, but that shouldn't impede functional review.
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-03-29 13:04:20 PDT
Comment on attachment 522378 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

The life cycle that's evolved so far is (read "*" and "?" as Kleene
closure and option)
 - create PToplevel
 - PTopLevelChild->SendPCrashReporterCtor
 - (annotate|app notes|...)*
 - (hang, take paired dump)?
 - PToplevelParent::ActorDestroy
   * compute process-specific annotations
   * compute common annotations
   * if hang dumps exist, annotate use them
   * else try TakeMinidump; if exists annotate and use
   * fire off process-specific notification of crash/no-crash

There are two platform-specific hairs on the process
 - "child-process main-thread ID", needed in generating hang dumps
 - android lib mapping stuff

AFAICT all this can be fit into PCrashReporter; I think your
ProcessReporter interface is just CrashReporterParent.  Furthermore, I
think the hang-dump generation code should be moved into
CrashReporterParent.  We'll need that for PContent pretty soon, I
think, when we start dealing with windows BS there.  It's fine with me
if that goes into a followup, but if you're in a refactorin' mood now
would be a good time.

Minor engineering issues
 - annotating the process type can stay entirely in PCrashReporter;
   CrashReporterChild knows what process it's running in
 - CrashReporterChild knows that the child-process main thread is; can
   manage that info too
 - CrashReporterChild can manage the hang-dump IDs and the crash-dump
   ID
 - CrashReporterParent will want interfaces like

// Adds generic annotations
template<class Toplevel>
bool GeneratePairedMinidumps(Toplevel* t, const AnnotationTable& processSpecific);

bool GetPairedMinidumps(childDump, parentDump);

// Adds generic annotations
template<class Toplevel>
bool GetCrashMinidump(Toplevel* t, const AnnotationTable& processSpecific);

The Toplevel parameterization allows us to use a kind of duck typing
in CrashReporterParent; it can just call
t->OtherProcess()/t->TakeMinidump()/etc. and the compiler will catch
those not being exposed or typed correctly.  Otherwise we'd need
another vtable through which to make these calls, for no gain IMHO.
All of the code needed for this is generic but lives in
PluginModuleParent atm.

 - PToplevelParent::ActorDestroy would use, GetPairedMinidumps() and
   GetCrashMinidump(), and look a lot like PluginModuleParent today.

Does this all make sense?  What you have now is looking good, but I'd
like to steer it down a slightly different path.
Comment 9 Ted Mielczarek [:ted.mielczarek] 2011-04-01 11:31:07 PDT
Comment on attachment 522378 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

Why are you adding GetOOPEnabled? Why not just restore GetEnabled() to work in either process? That way callers don't have to check both, they can just use GetEnabled, since the individual APIs will work in either process.

Aside from that the changes look alright, I'll leave the IPC bits to cjones.
Comment 10 Josh Matthews [:jdm] 2011-04-01 11:37:09 PDT
I added GetOOPEnabled because most of the API isn't supposed to work in child processes, and the GetEnabled check makes that work right now.
Comment 11 Josh Matthews [:jdm] 2011-06-01 08:29:18 PDT
Created attachment 536629 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

I believe this is more in line with what you had in mind, Chris. Ted, I haven't made any changes to the infrastructure you didn't like last time, but I have explained my reasoning since then, so take another look with that in mind, please?
Comment 12 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-06 11:32:53 PDT
Comment on attachment 536629 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

This is a sweet patch.  Cheers.

>+#ifdef MOZ_CRASHREPORTER
>+    return new CrashReporterChild;

Please use |new CrashReporterChild()|.  Doesn't matter here but this
form is for specialized uses and unnecessarily drew my attention.

>+#ifdef MOZ_CRASHREPORTER
>+void
>+ContentParent::WriteExtraDataForMinidump(CrashReporter::AnnotationTable& notes)

In general, prefer passing data instead of callbacks (I'm sr here so
can pontificate.  Woo!).  Callbacks are bad because they make the
behavior of your implementation include the behaviors of the
transitive closure of all the callbacks passed to the interface.  This
means you need to worry about nasty things like reentry etc.  Just say
no unless there's a compelling reason otherwise; having interface
users pass down an AnnotationTable isn't more code, is cheap enough
and this isn't perf-sensitive code anyway.  At least, if it were, we
would have serious problems ;).

>+    friend class CrashReporterParent;

|friend|ing for WriteExtraDataForMinidump?

>+    template<class Toplevel>
>+    static PCrashReporterChild* CreateCrashReporter(Toplevel* actor) {

The return value isn't used AFAICT, just make it void.  Include a
hard-assert here that this is only called once, or at least only
called when Toplevel doesn't already have a crash reporter.

>diff --git a/dom/ipc/CrashReporterParent.h b/dom/ipc/CrashReporterParent.h
>--- a/dom/ipc/CrashReporterParent.h
>+++ b/dom/ipc/CrashReporterParent.h

For template-heavy .h's with non-trivial function definitions, it's
easier to scan interfaces when there are separate decls and (inline,
of course) defitions.  Move the impls to the bottom of the file, or if
you want to be really stylish add CrashReporterParent-inl.h and put
the defs there.

>+  GenerateHangMinidump(Toplevel* t)
>+  GenerateCrashMinidump(Toplevel* t)

Hmm ... I think there's value in distinguishing these from
GeneratePairedMinidumps(), since they're creating different things.
These two are really making crash reports, and the former is indeed
creating paired minidumps.  I think s/Minidump/CrashReport/ would be
fine here.

My eyes kind of glazed over while I read the PluginModuleParent
changes (sorry, flying on one hour of ~sleep).  Let's get one more
patch with the above fixed.

It would be easier for ted and me if you separated the
toolkit/crashreporter stuff from the dom/ipc stuff.
Comment 13 Ted Mielczarek [:ted.mielczarek] 2011-06-08 10:41:10 PDT
Comment on attachment 536629 [details] [diff] [review]
Make crash report annotation work OOP and subsume existing workarounds.

Review of attachment 536629 [details] [diff] [review]:
-----------------------------------------------------------------

I only have one issue with this patch, the rest looks okay. (Although I skipped most of the IPC parts, of course.)

::: testing/xpcshell/xpcshell.ini
@@ +83,5 @@
>  
> +[include:toolkit/crashreporter/test/unit_ipc/xpcshell.ini]
> +skip-if.os = linux
> +run-if.config = mozcrashreporter
> +run-if.config = ipc

I'm not sure that this actually works, FWIW. (You have duplicated keys, for one, and didn't we remove --disable-ipc already?)

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1127,4 @@
>  
> +nsresult AnnotateCrashReport(const nsACString& key, const nsACString& data)
> +{
> +  if (!GetEnabled() && !GetOOPEnabled())

This kind of sucks. Can we just make "GetEnabled" return true for either kind of process, and then have "isOOP" or something if things need to check that specifically? Otherwise all our if (GetEnabled()) checks are going to wind up looking like this.

::: toolkit/crashreporter/test/unit_ipc/test_content_annotation.js
@@ +1,2 @@
> +load("../unit/head_crashreporter.js");
> +

Thanks for including a test! Can you test AppendAppNotes as well here?

::: toolkit/xre/nsSigHandlers.cpp
@@ +261,5 @@
>  
>  #if defined(CRAWL_STACK_ON_SIGSEGV)
> +  if (!CrashReporter::GetEnabled() &&
> +      !CrashReporter::GetOOPEnabled() &&
> +      !getenv("XRE_NO_WINDOWS_CRASH_DIALOG")) {

Is this actually related, or just extraneous? (You'll need someone else to review it if it's needed.)
Comment 14 Josh Matthews [:jdm] 2011-06-14 16:11:59 PDT
Created attachment 539354 [details] [diff] [review]
Part 4: Disallow re-entry of IPC xpcshell command evaluation.
Comment 15 Josh Matthews [:jdm] 2011-06-14 16:17:59 PDT
Created attachment 539359 [details] [diff] [review]
Part 3: Disable alternate signal handlers if crash reporter is enabled.
Comment 16 Josh Matthews [:jdm] 2011-06-14 16:26:37 PDT
The patches I'm uploading now are scaffolding that seems to be rock solid. All of the work is done, but I alternately cause oranges on mac/linux or windows because of platform inconsistencies with minidump paths. Once that mess is cleared up I'll put the remaining patches up.
Comment 17 Josh Matthews [:jdm] 2011-06-15 06:30:28 PDT
Created attachment 539492 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.
Comment 18 Josh Matthews [:jdm] 2011-06-15 06:31:08 PDT
Created attachment 539493 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.
Comment 19 Josh Matthews [:jdm] 2011-06-15 06:31:53 PDT
Created attachment 539494 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.
Comment 20 Josh Matthews [:jdm] 2011-06-15 06:33:08 PDT
Created attachment 539495 [details] [diff] [review]
Part 6: Test.
Comment 21 Josh Matthews [:jdm] 2011-06-15 06:35:13 PDT
All previous review comments addressed, one way or another. Chris, the friending occurs because TakeMinidump is private to the actors. I'm sure I can dig around in the IPDL generator to fix this if you want.
Comment 22 Josh Matthews [:jdm] 2011-06-15 06:36:30 PDT
Comment on attachment 539495 [details] [diff] [review]
Part 6: Test.

Ted, I made a couple other changes to the subprocess head file, so I'd appreciate you glancing over them to make sure they're ok.
Comment 23 Josh Matthews [:jdm] 2011-06-15 06:37:48 PDT
Created attachment 539496 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.
Comment 24 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-15 09:46:29 PDT
Comment on attachment 539354 [details] [diff] [review]
Part 4: Disallow re-entry of IPC xpcshell command evaluation.

As discussed on irc I don't think we should make testshell very smart here. If a test spins an event loop that the test writer wasn't expecting then we should fix the test (and nuke the nested loop if possible!) rather than hide that detail.
Comment 25 Josh Matthews [:jdm] 2011-06-15 12:56:16 PDT
Created attachment 539626 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.
Comment 26 Josh Matthews [:jdm] 2011-06-15 13:02:39 PDT
Comment on attachment 539495 [details] [diff] [review]
Part 6: Test.

There's a new version coming up, but I'm waiting for try results before asking for review of the test changes.
Comment 27 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-15 14:15:34 PDT
Comment on attachment 539626 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.

Review of attachment 539626 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/testshell/TestShellParent.cpp
@@ +141,5 @@
>  
>    return JS_TRUE;
>  }
>  
> +class AsyncCommandCallback : public nsRunnable {

I don't think you should make this run asynchronously, as discussed on irc.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +756,5 @@
>  {
>    if (!gTestShellParent)
>      return true;
> +  ContentParent* cp = ContentParent::GetSingleton(PR_FALSE);
> +  return cp ? cp->DestroyTestShell(gTestShellParent) : true;

This seems wrong... Something should unset gTestShellParent if ContentParent dies. Add an ActorDestroy to TestShellParent too?
Comment 28 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-15 17:03:17 PDT
Comment on attachment 539492 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.

Code looks OK.  Nits/comment stuff follow.

>+            (void) crashReporter->GenerateCrashReport(this, NULL);

I don't like (void) --- for functions that have a return value that
can be ignored, it's not helpful IMHO, and for functions that have a
return value that needs to be explicitly ignored
(NS_ATTR_WARN_UNUSED_RESULT), |(void)| doesn't do the trick.  Use
nothing for ignore-able return values and |unused << Foo()| for ones
that have to explicitly be ignored.  Note too that |(void)| missed the
convert-to-|void|-return signature change, which |unused <<| wouldn't
have.

>+  GenerateHangCrashReport(CrashReporter::AnnotationTable* processNotes);

|typedef CrashReporter::AnnotationTable AnnotationTable| in
 CrashReporterParent so that you can write AnnotationTable.

All these |processNotes| args should be |const AnnotationTable*|.

These new interfaces need documentation.

r=me with nit fixes and comments.
Comment 29 Benjamin Smedberg [:bsmedberg] 2011-06-17 10:45:00 PDT
Comment on attachment 539359 [details] [diff] [review]
Part 3: Disable alternate signal handlers if crash reporter is enabled.

As discussed on IRC, I think there are ordering issues here with the crashreporter (which gets installed after this hook in the main process).
Comment 30 Josh Matthews [:jdm] 2011-06-23 16:30:27 PDT
Created attachment 541542 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.
Comment 31 Josh Matthews [:jdm] 2011-06-23 16:30:45 PDT
Created attachment 541543 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.
Comment 32 Josh Matthews [:jdm] 2011-06-23 16:31:18 PDT
Created attachment 541544 [details] [diff] [review]
Part 3: Install child process exception handler before activating breakpad.
Comment 33 Josh Matthews [:jdm] 2011-06-23 16:32:51 PDT
Created attachment 541545 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.
Comment 34 Josh Matthews [:jdm] 2011-06-23 16:33:15 PDT
Created attachment 541546 [details] [diff] [review]
Part 6: Test.
Comment 35 Josh Matthews [:jdm] 2011-06-23 16:41:27 PDT
Now (finally) passes try on all platforms. I have a grudging respect for our wide test coverage. I think I've already received r+ for the meat of these patches, and the only differences now are minor ones to fix build/test problems.
Comment 36 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-29 09:31:59 PDT
Comment on attachment 541545 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.

Review of attachment 541545 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +329,5 @@
>  
> +TestShellParent*
> +ContentParent::GetTestShellSingleton()
> +{
> +    if (!ManagedPTestShellParent().Length())

Make this:

  if (ManagedPTestShellParent().IsEmpty())

And please add braces to single statement if blocks.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +714,1 @@
>      }

Nit: newline after this.

@@ +716,5 @@
> +    if (testShell) {
> +      return testShell;
> +    }
> +    testShell = parent->CreateTestShell();
> +    return testShell;

Make this:

  if (!testShell) {
    testShell = parent->CreateTestShell();
  }
  return testShell;

@@ +762,2 @@
>      return true;
> +  return cp->DestroyTestShell(tsp);

Make this:

  return tsp ? cp->DestroyTestShell(tsp) : true;
Comment 39 :Ms2ger (⌚ UTC+1/+2) 2011-06-30 04:32:46 PDT
Comment on attachment 541544 [details] [diff] [review]
Part 3: Install child process exception handler before activating breakpad.

># HG changeset patch
># User Josh Matthews <josh@joshmatthews.net>
># Date 1308871831 14400
># Node ID 9d08257cbc6a98a61c41d7b79245dfdb605f2e11
># Parent  f8aa713fb717c41d424f3410998d3fd7703f8a0e
>Bug 581341 - Part 3: Install child process exception handler before activating breakpad. r=bsmedberg
>
>diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp
>--- a/toolkit/xre/nsEmbedFunctions.cpp
>+++ b/toolkit/xre/nsEmbedFunctions.cpp
>+
>+  SetupErrorHandling(aArgv[0]);  
>+
> #if defined(MOZ_CRASHREPORTER)
>   if (aArgc < 1)
>     return 1;

You seem to be touching argv[0] before knowing if there's anything there.
Comment 40 Marco Bonardo [::mak] 2011-06-30 05:48:52 PDT
looks like this has been backed out again by jdm, it's not in inbound.
Comment 41 Karl Tomlinson (:karlt) 2011-06-30 13:24:33 PDT
(In reply to comment #39)
> You seem to be touching argv[0] before knowing if there's anything there.

XRE_InitChildProcess has:

NS_ENSURE_ARG_MIN(aArgc, 2);
NS_ENSURE_ARG_POINTER(aArgv);
NS_ENSURE_ARG_POINTER(aArgv[0]);

and the line immediately following the context quoted in comment 39 has

 const char* const crashReporterArg = aArgv[--aArgc];
Comment 42 Josh Matthews [:jdm] 2011-07-08 07:56:15 PDT
Created attachment 544819 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.

Chris, if you could sign off on the reversed direction and rpc-ness of the crash reporter actor creation, that would be swell. Try says it's quite happy with this series of patche, finally.
Comment 43 Josh Matthews [:jdm] 2011-07-08 07:59:45 PDT
Created attachment 544822 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.

Ted, I want your approval that the change to the LSP annotation gatherer is ok. This was necessary to make windows xpcshell tests pass, because occasionally the annotator thread would finish before the crash reporter message was received from the parent.
Comment 44 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-07-08 08:24:09 PDT
Comment on attachment 544819 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.

We discussed rpc-ifying the plugin crash reporter.  Why are you rpc-ifying the content crash reporter?
Comment 45 Josh Matthews [:jdm] 2011-07-08 08:26:46 PDT
Because I like to live life on the edge. I'll fix that.
Comment 46 Ted Mielczarek [:ted.mielczarek] 2011-07-08 11:22:59 PDT
Comment on attachment 544822 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.

Review of attachment 544822 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/src/windows/LSPAnnotator.cpp
@@ +76,5 @@
> +    nsresult rv = cr->AnnotateCrashReport(NS_LITERAL_CSTRING("Winsock_LSP"), mString);
> +    if (NS_FAILED(rv)) {
> +      // If the crash reporter is enabled but annotation failed, we're probably
> +      // out-of-process and haven't properly setup up the remote bridge yet.
> +      // Let's give this another shot in a bit.

This is a bit icky, is there a way to push back the call to this function until things are ready, instead of racing and retrying? Not going to r- you on that (this isn't code I own!) but seems like it would be nicer.+
Comment 47 Brad Lassey [:blassey] (use needinfo?) 2011-07-22 10:02:54 PDT
Josh, what's the plan here. Were you thinking we'd take this for aurora? If so, that window is closing fast and we'd need a risk assessment.
Comment 48 Josh Matthews [:jdm] 2011-07-22 10:04:16 PDT
I really want to, but I haven't been able to land it without exposing new test breakages yet. I'm churning away.
Comment 49 Josh Matthews [:jdm] 2011-09-01 11:12:17 PDT
Created attachment 557578 [details] [diff] [review]
Hack around an MSVC PGO bug.
Comment 50 Josh Matthews [:jdm] 2011-09-13 11:47:14 PDT
Created attachment 560005 [details] [diff] [review]
Part 1: Make crash report annotation work OOP and subsume existing workarounds.
Comment 51 Josh Matthews [:jdm] 2011-09-13 11:47:43 PDT
Created attachment 560006 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *
Comment 52 Josh Matthews [:jdm] 2011-09-13 11:48:12 PDT
Created attachment 560007 [details] [diff] [review]
Part 3: Install child process exception handler before activating breakpad.
Comment 53 Josh Matthews [:jdm] 2011-09-13 11:49:00 PDT
Created attachment 560009 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.
Comment 54 Josh Matthews [:jdm] 2011-09-13 11:49:35 PDT
Created attachment 560010 [details] [diff] [review]
Part 6: Test.
Comment 55 Josh Matthews [:jdm] 2011-09-13 11:50:03 PDT
Created attachment 560011 [details] [diff] [review]
Hack around an MSVC PGO bug.
Comment 56 Josh Matthews [:jdm] 2011-09-13 11:54:20 PDT
Comment on attachment 560009 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.

The switch to multiple content processes wrecked the previous solution for shutting down the test shell. Ben, does this solution look ok?
Comment 57 Josh Matthews [:jdm] 2011-09-13 11:55:59 PDT
Comment on attachment 560006 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *

Ted, remember how I added a delayed LSP annotator runnable previously? I've generalized that to all crash reporter annotations; let me know if you'd like any changes.
Comment 58 Josh Matthews [:jdm] 2011-09-13 11:57:14 PDT
I believe I've hit all the reviewers necessary; all other patches should be able to carry forward the last r+.
Comment 59 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-21 17:13:16 PDT
Comment on attachment 560009 [details] [diff] [review]
Part 5: Always run the IPC testshell callback, regardless of execution success or failure.

Review of attachment 560009 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok, please fix the following and r=me.

::: dom/ipc/ContentParent.cpp
@@ +400,5 @@
> +TestShellParent*
> +ContentParent::GetTestShellSingleton()
> +{
> +    if (!ManagedPTestShellParent().Length())
> +        return nsnull;

Nit: Please brace.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +711,5 @@
>  #endif // XP_MACOSX
>  }
>  
>  namespace {
> +nsRefPtr<ContentParent> gContentParent = nsnull;

static nsRefPtr is a no-no. Use a raw pointer and manually addref/release. Add a comment here saying that the pointer carries a reference.

@@ +716,5 @@
>  TestShellParent* GetOrCreateTestShellParent()
>  {
> +    if (!gContentParent) {
> +        gContentParent = ContentParent::GetNewOrUsed();
> +    } else if (!gContentParent->IsAlive()) {

Nit: else on next line.

@@ +764,5 @@
>  bool
>  XRE_ShutdownTestShell()
>  {
> +    if (!gContentParent)
> +        return true;

Nit: Please brace.
Comment 60 Ted Mielczarek [:ted.mielczarek] 2011-09-22 13:09:13 PDT
Comment on attachment 560011 [details] [diff] [review]
Hack around an MSVC PGO bug.

Review of attachment 560011 [details] [diff] [review]:
-----------------------------------------------------------------

This is actually pretty clean. I'm impressed!

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1100,5 @@
>                                 NS_LITERAL_CSTRING("\n"));
>    return PL_DHASH_NEXT;
>  }
>  
> +// This function is miscompiled with MSVC 2005/2008 when PGO is on.  See bug 684500.

The bug number is extraneous, it's in the blame.

@@ +1102,5 @@
>  }
>  
> +// This function is miscompiled with MSVC 2005/2008 when PGO is on.  See bug 684500.
> +#ifdef _MSC_VER
> +#pragma optimize("", off)

This actually works? We had mixed results in the past, although that might have been due to inlining etc.
Comment 61 Josh Matthews [:jdm] 2011-09-22 13:31:10 PDT
(In reply to Ted Mielczarek [:ted, :luser] from comment #60)
> >  
> > +// This function is miscompiled with MSVC 2005/2008 when PGO is on.  See bug 684500.
> > +#ifdef _MSC_VER
> > +#pragma optimize("", off)
> 
> This actually works? We had mixed results in the past, although that might
> have been due to inlining etc.

It's worked on every try push so far!
Comment 62 Ted Mielczarek [:ted.mielczarek] 2011-09-23 07:56:23 PDT
Comment on attachment 560006 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *

Review of attachment 560006 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1143,5 @@
> +  
> + private:
> +  nsCString mKey;
> +  nsCString mData;
> +  bool mAnnotate;

Using a bool here is not exactly clear. Could you maybe use an enum instead?

@@ +1161,5 @@
> +    PCrashReporterChild* reporter = CrashReporterChild::GetCrashReporter();
> +    if (!reporter) {
> +      nsRefPtr<DelayedNoteRunnable> event = new DelayedNoteRunnable(key, data);
> +      return NS_DispatchToMainThread(event);
> +    }

So this whole construct exists so that if the IPC crashreporter bits aren't set up yet, we'll handle it later?

Do we have confidence that this will only take one cycle through the event loop, or would it make more sense to instead stick these in a queue and flush them all out when things get properly intitialized?
Comment 63 Josh Matthews [:jdm] 2011-09-27 09:05:41 PDT
Created attachment 562782 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *

Here's the queue. I think it makes more sense that just blindly queueing runnables.
Comment 64 Josh Matthews [:jdm] 2011-09-27 09:12:18 PDT
Created attachment 562783 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *

Whoops, forgot the boolean change.
Comment 65 Ted Mielczarek [:ted.mielczarek] 2011-09-27 10:16:50 PDT
Comment on attachment 562783 [details] [diff] [review]
Part 2: Hook up CrashReporter functions to IPC.  * *

Review of attachment 562783 [details] [diff] [review]:
-----------------------------------------------------------------

Meta-note: you have trychooser syntax in your patch description, remember to remove that before landing.

This looks good modulo one thing I'm not sure about.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +239,5 @@
>  
> +// If annotations are attempted before the crash reporter is enabled,
> +// they queue up here.
> +class DelayedNoteRunnable;
> +nsTArray<nsRefPtr<DelayedNoteRunnable> > gDelayedAnnotations;

Is it safe to have a static nsTArray of nsRefPtrs? Get someone who knows the answer to sign off on that, please.
Comment 66 Karl Tomlinson (:karlt) 2011-09-27 21:29:29 PDT
nsTArray_base() does some initialization and normally we try to avoid static constructors.
http://hg.mozilla.org/mozilla-central/annotate/7f4867717226/xpcom/glue/nsTArray-inl.h#l44

I'd also expect some issues with leak checking tools running before the destructor, but I'm not up-to-date on the order in which those run.
Comment 67 Josh Matthews [:jdm] 2011-09-27 21:48:09 PDT
Yeah, the try run showed a leaking nsTArray_base for every single run. I'm going to need to find some other way to do that, obviously.
Comment 68 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-09-29 23:06:57 PDT
Hey Josh, how's this work coming along?  Still looking to land soon?
Comment 69 Josh Matthews [:jdm] 2011-09-30 00:02:48 PDT
Yep! I've been doing some try runs and getting some strange results on Android though, so I want to make sure that I'm not causing some of the oranges I saw.
Comment 71 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-09-30 17:57:07 PDT
\o/

Note You need to log in before you can comment on or make changes to this bug.