[e10s] ContentParent should annotate crash reports prior to calling GeneratePairedMinidump

RESOLVED FIXED in Firefox 41

Status

()

Core
DOM: Content Processes
RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: jimm, Assigned: jimm)

Tracking

Trunk
mozilla41
x86_64
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox40 affected, firefox41 fixed)

Details

Attachments

(1 attachment, 8 obsolete attachments)

(Assignee)

Comment 1

3 years ago
Created attachment 8590803 [details] [diff] [review]
patch
(Assignee)

Updated

3 years ago
Attachment #8590803 - Flags: review?(ted)
(Assignee)

Comment 2

3 years ago
I tested this locally and everything seems to be working. To test on try I pushed this plus a plug that returns false from ContentParent's LoadPlugin. That should trigger this quite a bit in mochitest.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc4f0fac71bc
Comment on attachment 8590803 [details] [diff] [review]
patch

Review of attachment 8590803 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at how empty that .extra file was I feel like we're still missing some annotation bits that get called in the normal hang/crash case. I started looking into it this morning but got a little lost and don't have that much time to devote to sorting it out right now.
Attachment #8590803 - Flags: review?(ted) → review+
(Assignee)

Comment 4

3 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> Comment on attachment 8590803 [details] [diff] [review]
> patch
> 
> Review of attachment 8590803 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looking at how empty that .extra file was I feel like we're still missing
> some annotation bits that get called in the normal hang/crash case. I
> started looking into it this morning but got a little lost and don't have
> that much time to devote to sorting it out right now.

I'll file a follow up on this. Eventually someone will need this data and fix it. :)
(Assignee)

Comment 6

3 years ago
Created attachment 8590987 [details] [diff] [review]
wip
Attachment #8590803 - Attachment is obsolete: true
(Assignee)

Comment 7

3 years ago
Created attachment 8590989 [details] [diff] [review]
wip
Attachment #8590987 - Attachment is obsolete: true
(Assignee)

Comment 8

3 years ago
Created attachment 8591037 [details] [diff] [review]
tmp.txt
Attachment #8590989 - Attachment is obsolete: true
(Assignee)

Comment 9

3 years ago
Created attachment 8591040 [details] [diff] [review]
wip
Attachment #8591037 - Attachment is obsolete: true
(Assignee)

Comment 10

3 years ago
Using the latest wip, good meta data here. Going to push to try again and trigger a crash, see what we get.

https://crash-stats.mozilla.com/report/index/5b1fe74c-67f6-4710-92f5-852742150410
(Assignee)

Comment 11

3 years ago
Created attachment 8591094 [details] [diff] [review]
patch v.2
Attachment #8591040 - Attachment is obsolete: true
(Assignee)

Comment 12

3 years ago
Still getting a crash in google_breakpad::ExceptionHandler::WriteMinidump() in a really weird place - 

http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#741
(Assignee)

Comment 13

3 years ago
Hmm, this top frame was found through "given as instruction pointer in context".. I wonder if that can be inaccurate. Working with a debug build might be better here.
(Assignee)

Comment 14

3 years ago
(In reply to Jim Mathies [:jimm] from comment #13)
> Hmm, this top frame was found through "given as instruction pointer in
> context".. I wonder if that can be inaccurate. Working with a debug build
> might be better here.

reliable as in, sits at the very bottom of the "trust list" - 

http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/google_breakpad/processor/stack_frame.h#53
Yeah, that's accurate. What did your try log look like?
(Assignee)

Comment 16

3 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #15)
> Yeah, that's accurate. What did your try log look like?

It crashed in the same place - 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2e0187da8ef

I'm wondering if this has something to do with that WaitForSingleObject call ExceptionHandler makes when it yields to its worker thread. Maybe something is running out of band, through some sort of callback or something. Just a guess.

The big question I'm left with is how to debug it, since I can't reproduce locally.
So this isn't crashing. Note:
 PROCESS-CRASH | Main app process exited normally | application crashed [@ ntdll.dll + 0x470b4] 

This is just what a minidump from the browser process looks like when you call CreatePairedMinidumps. The problem is that something is calling this code path and either not calling SimpleTest.expectChildProcessCrash(), or something in your patch mucks up how expectChildProcessCrash cleans up the expected dumps.
(Assignee)

Comment 18

3 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #17)
> So this isn't crashing. Note:
>  PROCESS-CRASH | Main app process exited normally | application crashed [@
> ntdll.dll + 0x470b4] 
> 
> This is just what a minidump from the browser process looks like when you
> call CreatePairedMinidumps. The problem is that something is calling this
> code path and either not calling SimpleTest.expectChildProcessCrash(), or
> something in your patch mucks up how expectChildProcessCrash cleans up the
> expected dumps.

That's good to hear. Maybe I'm just running into some random test failures here.
(Assignee)

Comment 19

3 years ago
(In reply to Jim Mathies [:jimm] from comment #18)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #17)
> > So this isn't crashing. Note:
> >  PROCESS-CRASH | Main app process exited normally | application crashed [@
> > ntdll.dll + 0x470b4] 
> > 
> > This is just what a minidump from the browser process looks like when you
> > call CreatePairedMinidumps. The problem is that something is calling this
> > code path and either not calling SimpleTest.expectChildProcessCrash(), or
> > something in your patch mucks up how expectChildProcessCrash cleans up the
> > expected dumps.
> 
> That's good to hear. Maybe I'm just running into some random test failures
> here.

Ha! Forgot I was doing this on purpose so I could see the crash reports!

https://hg.mozilla.org/try/rev/4fcacd5134ba#l1.12
(Assignee)

Comment 21

3 years ago
Created attachment 8591928 [details] [diff] [review]
patch v.3
Attachment #8591094 - Attachment is obsolete: true
Attachment #8591928 - Flags: review?(ted)
(Assignee)

Updated

3 years ago
tracking-e10s: --- → +
Comment on attachment 8591928 [details] [diff] [review]
patch v.3

Review of attachment 8591928 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +3309,5 @@
>  
>  #if defined(MOZ_CRASHREPORTER) && !defined(MOZ_B2G)
> +    // We're about to kill the child process associated with this content.
> +    // Something has gone wrong to get us here, so we generate a minidump
> +    // of the parent and child for submittal to the crash server.

The word you're looking for is "submission". :)

::: dom/ipc/CrashReporterParent.cpp
@@ +145,5 @@
> +        // If we don't have a dump id yet, we haven't called
> +        // CrashReporter::CreatePairedMinidumps. Apply any annotations using the
> +        // generic crash reporter api. When we call CreatePairedMinidumps
> +        // these annotations will get written out to an extra file associated
> +        // with our report.

I'm not sure this is right. These annotations are from the child process, but calling AnnotateCrashReport sticks them into the chrome process' list of annotations. That means that if we later crashed the chrome process we'd get these annotations in the crash report which would be confusing.

@@ +180,5 @@
> +
> +    if (NS_IsMainThread()) {
> +        // Inline, this is the main thread. Get this done.
> +        NotifyCrashService();
> +        return;

It'd be nice if this pattern was easier to write with a one-liner that did the right thing for main-thread-or-not.
Attachment #8591928 - Flags: review?(ted)
(Assignee)

Comment 23

3 years ago
Created attachment 8602147 [details] [diff] [review]
patch v.4
Attachment #8591928 - Attachment is obsolete: true
(Assignee)

Comment 24

3 years ago
Created attachment 8602150 [details] [diff] [review]
patch v.4
Attachment #8602147 - Attachment is obsolete: true
(Assignee)

Comment 25

3 years ago
Comment on attachment 8602150 [details] [diff] [review]
patch v.4

this adds the all in one call, and basically just bails if somebody does something in the wrong order.
Attachment #8602150 - Flags: review?(ted)
(Assignee)

Updated

3 years ago
Blocks: 1160142
Comment on attachment 8602150 [details] [diff] [review]
patch v.4

Review of attachment 8602150 [details] [diff] [review]:
-----------------------------------------------------------------

This is all getting very complicated for my brain to handle, but I think I follow.
Attachment #8602150 - Flags: review?(ted) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #22)
> ...
> 
> ::: dom/ipc/CrashReporterParent.cpp
> @@ +145,5 @@
> > +        // If we don't have a dump id yet, we haven't called
> > +        // CrashReporter::CreatePairedMinidumps. Apply any annotations using the
> > +        // generic crash reporter api. When we call CreatePairedMinidumps
> > +        // these annotations will get written out to an extra file associated
> > +        // with our report.
> 
> I'm not sure this is right. These annotations are from the child process,
> but calling AnnotateCrashReport sticks them into the chrome process' list of
> annotations. That means that if we later crashed the chrome process we'd get
> these annotations in the crash report which would be confusing.

I'd actually want to get the annotations from all processes in the crash report, or did I misunderstand the objection?  I don't care if it's separated somehow, as long as the annotations from child processes are available in the crash report even if it was only the chrome process that crashed.
(In reply to Milan Sreckovic [:milan] from comment #27)
> I'd actually want to get the annotations from all processes in the crash
> report, or did I misunderstand the objection?  I don't care if it's
> separated somehow, as long as the annotations from child processes are
> available in the crash report even if it was only the chrome process that
> crashed.

That's not how it works, and I don't see how you could make that workable in the general sense. If we have >1 content process and they all AnnotateCrashReport("URL", <whatever>), you want those annotations to be per-process.

Currently the main process has a set of annotations, and each child process has its own set of annotations. When the main process crashes it uses its set. When a child process crashes it uses its own set along with a subset of the parent process annotations (specifically everything except the ones in the blacklist here: https://hg.mozilla.org/mozilla-central/annotate/baa9c64fea6f/toolkit/crashreporter/nsExceptionHandler.cpp#l308 ).
It may be how it works, but that makes the crash report less useful, as it may have been the child process that caused the parent process to crash, and we lose valuable information if we don't see the details of the child process annotations.

Child process crashing picking up a subset of parent annotations sounds good - why can't parent process crashing also pick up annotations from all the child processes?
And I understand that the child process is not "crashing" when the parent process crashes, but I'm arguing that there is enough information in the child process annotations that I'd want them picked up even though the child process isn't crashing, as such.
We have multiple child processes. How should we sort out annotations from each? Even in a single content process e10s case we can have that content process + multiple plugin (and GMP) processes.
https://hg.mozilla.org/mozilla-central/rev/ce6dbefadccd
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41

Updated

a year ago
Depends on: 1360308
You need to log in before you can comment on or make changes to this bug.