Closed Bug 1323097 Opened 8 years ago Closed 5 years ago

Crash reason does not appear in minidumps submitted via the infobar

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: billm, Unassigned)

References

(Depends on 1 open bug)

Details

I think we don't get the crash reason since it's attached via PrepareChildExceptionTimeAnnotations. That function writes its annotations into a temp file that (I believe) doesn't get picked up by the infobar code.

When we fix this, we should also make sure that the shmem code is updated as well (see bug 1278717).
Brad, can you help find an owner for this? This came up in bug 1322211 as a reason for why we're getting crash submissions without an abort reason.
Flags: needinfo?(blassey.bugs)
Having talked to aklotz, it seems like bug 1236108 made it so that the content process, when it crashes, creates both a minidump as well as its own .extra file (called GeckoChildCrash<pid>.extra). That .extra file is where the information we care about is.

Strangely, we've been apparently getting the information from that .extra file from about:tabcrashed, and both about:tabcrashed and the unsubmitted crash report notification bar both submit using the same mechanism.

Anyhow, I'll look into this.
Assignee: nobody → mconley
Status: NEW → ASSIGNED
Flags: needinfo?(blassey.bugs)
Flags: needinfo?(mconley)
Actually, more interesting is that with the function at [1], when constructing the .extra file for the main dump, the content process .extra information is merged into it.

I wonder if what's happening here is that sometimes this function isn't running - perhaps in the event that the parent process hasn't also had a minidump collected for it (ie, it wasn't a paired minidump).

[1]: http://searchfox.org/mozilla-central/rev/e95e4ed8b5229a29dacc32c0b90968df5e7a6ff3/toolkit/crashreporter/nsExceptionHandler.cpp#3298-3299
Hey billm, in bug 1322211 comment 2, you mentioned that the we're not picking up the extra values from the content process. It looks like bug 1236108 added code to sew the .extra file from the parent process together with one from the content process, and that's what gets sent. I just confirmed that this merging and sending works when sending from the infobar (which isn't super surprising, since they share the same crash report submission backend).

Do you know of a reliable way of creating the class of crash that caused this bug to be filed? Perhaps something in how we write the child process .extra file (or merge with the parent process) is falling over in certain conditions.

In any case, I suspect the notification bar code is innocent.
Flags: needinfo?(mconley) → needinfo?(wmccloskey)
If the parent process has already crashed or shut down, then we won't do the merging. I suspect that's what is happening. I'm not really sure what we could do about this then.

I think Chrome has a crash reporter process that just sits there and watches for crashes from all the other processes. That seems like a much better design, but we're probably a long way from that point.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #5)
> If the parent process has already crashed or shut down, then we won't do the
> merging. I suspect that's what is happening. I'm not really sure what we
> could do about this then.
> 

Ah. In that case, I wonder if it'd be better to try sending the content process dump and its .extra file (which should still be on the disk) on their own.

Would that be useful? Is it even possible / supported to send a content process crash dump that isn't paired with a stack from the parent?

peterbe, do you happen to know the answer to that last question?
Flags: needinfo?(peterbe)
(In reply to Mike Conley (:mconley) from comment #6)
> (In reply to Bill McCloskey (:billm) from comment #5)
> > If the parent process has already crashed or shut down, then we won't do the
> > merging. I suspect that's what is happening. I'm not really sure what we
> > could do about this then.
> > 
> 
> Ah. In that case, I wonder if it'd be better to try sending the content
> process dump and its .extra file (which should still be on the disk) on
> their own.
> 
> Would that be useful? Is it even possible / supported to send a content
> process crash dump that isn't paired with a stack from the parent?

Yes, it's certainly possible. I thought it was even the typical case.
If I'm reading this right, then we create paired minidumps even when the content process goes down without the parent force-killing it: http://searchfox.org/mozilla-central/rev/f5077ad52f8b90183e73038869f6140f0afbf427/dom/ipc/ContentParent.cpp#1591-1593

In any case, perhaps the UnsubmittedCrashHandler can scan for those GeckoChildCrash<pid>.extra files and "do the right thing". I'll look at that today.
(In reply to Mike Conley (:mconley) from comment #8)
> If I'm reading this right, then we create paired minidumps even when the
> content process goes down without the parent force-killing it:
> http://searchfox.org/mozilla-central/rev/
> f5077ad52f8b90183e73038869f6140f0afbf427/dom/ipc/ContentParent.cpp#1591-1593

I'm now second-guessing that part. I don't think my mental model of how it works is exactly correct. It's quite possible we only submit the dump for the content process here (which makes sense - the parent process dump is unlikely useful when the content process goes down for reasons unrelated to what the parent is doing).
Hey aklotz, is there any way for the content process to know the UUID that the content process dump is going to be written to at the time when the GeckoChildCrash<pid>.extra file is written?

I ask because there's not currently a great way of associating loose GeckoChildCrash<pid>.extra files that haven't been merged with dumps in the directory. If we were able to make this, for example, <uuid>-GeckoChildCrash<pid>.extra, that'd make it much simpler. Or is this information not available when that file is written?
Flags: needinfo?(aklotz)
Yeah, in the normal case (where the content process simply crashes) we just send a single minidump. Paired minidumps are used for hangs where we force-kill the content process and some other situation that I've forgotten the details of. If the content process crash doesn't have the additional parent annotations merged in submission is not likely to work, since I believe that's where it gets things like the application name and whatnot.
Flags: needinfo?(peterbe)
(In reply to Mike Conley (:mconley) from comment #10)
> Hey aklotz, is there any way for the content process to know the UUID that
> the content process dump is going to be written to at the time when the
> GeckoChildCrash<pid>.extra file is written?

As discussed today on IRC:

11:38:35 AM - aklotz: mconley: what if we did it the other way around: each content generates its own uuid, registers that w/ parent. Parent records the child uuids as part of its .extra file. Then if we need to do the merge later on, we see those annotations in the parent .extra file and use them as pointers to the child .extra files to merge?
11:40:05 AM - mconley: That's not bad
11:41:17 AM - mconley: aklotz: although in this scenario, the parent .extra file has never been written
11:41:30 AM - mconley: but I like this line of thinking
11:42:34 AM - aklotz: mconley: yeah, that really only works in the "parent crashing too" case... but yeah I think this could be fleshed out in a way that could work
11:42:54 AM - aklotz: just need to find a nice way to persist that info

Then we discussed the bug 1338286 route.
Flags: needinfo?(aklotz)
Just FYI, David is changing a lot of stuff in bug 1338308. That will move everything over to the shmem approach.
Okay. It's quite possible in that case that bug 1338308 will take care of this particular case. Once that bug closes, I'll poke at this again to see if there's anything that the UnsubmittedCrashHandler needs to do.
Hey dvander, great job with bug 1338308! I'm curious - now that that's in tree, do you know if we can safely use the shmem mechanism to send these annotations that are computed at exception time in the content process: http://searchfox.org/mozilla-central/rev/ea31c4b64f34a29415a69fb768f8051495547315/toolkit/crashreporter/nsExceptionHandler.cpp#1387-1419 ?

If so, we can probably just use that instead of the stuff from bug 1236108 for content process annotations.
Flags: needinfo?(dvander)
(In reply to Mike Conley (:mconley) from comment #15)
> Hey dvander, great job with bug 1338308! I'm curious - now that that's in
> tree, do you know if we can safely use the shmem mechanism to send these
> annotations that are computed at exception time in the content process:
> http://searchfox.org/mozilla-central/rev/
> ea31c4b64f34a29415a69fb768f8051495547315/toolkit/crashreporter/
> nsExceptionHandler.cpp#1387-1419 ?
> 
> If so, we can probably just use that instead of the stuff from bug 1236108
> for content process annotations.

I'm not 100% sure, it would depend on two things:
 (1) that the memory location containing the CrashReporterClient/shmem hasn't been stomped
 (2) the UI process hasn't yet tried to read the metadata shmem (which happens on ActorDestroy, after the IPDL pipe is detected broken)

If both of those are true it would be safe to switch to normal annotations again. Though now that I think about it, there's no reason we can't just do the annotation right before the exception occurs (i.e. inside MOZ_CRASH and wherever).
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #16)
> 
> I'm not 100% sure, it would depend on two things:
>  (1) that the memory location containing the CrashReporterClient/shmem
> hasn't been stomped
>  (2) the UI process hasn't yet tried to read the metadata shmem (which
> happens on ActorDestroy, after the IPDL pipe is detected broken)
> 

Okay, great. I don't think there's too much I can do about (1), but if my memory serves me, (2) is already satisfied. I believe that by the time that the ExceptionHandler has been writing these annotations, the content process is still in the process of going down, and the pipe should still be up. aklotz, does that sound right?

> If both of those are true it would be safe to switch to normal annotations
> again. Though now that I think about it, there's no reason we can't just do
> the annotation right before the exception occurs (i.e. inside MOZ_CRASH and
> wherever).

I suppose we could do that, assuming we only cared about the crashes that are occurring intentionally. Another question for aklotz - can I assume that PrepareChildExceptionTimeAnnotations is writing annotations that are useful even when we're crashing unintentionally?
Flags: needinfo?(aklotz)
(In reply to Mike Conley (:mconley) from comment #17)
> I suppose we could do that, assuming we only cared about the crashes that
> are occurring intentionally. Another question for aklotz - can I assume that
> PrepareChildExceptionTimeAnnotations is writing annotations that are useful
> even when we're crashing unintentionally?

Yes, you may assume that.
Flags: needinfo?(aklotz)
I'm afraid Photon / QF stuff is taking a lot of my cycles lately. Freeing this up.
Assignee: mconley → nobody
Status: ASSIGNED → NEW

All issues related to missing annotations should have been solved by bug 1547698. Also if I lookup crashes sent from the infobar on crashstats plenty of them have the crash reason set so I guess this if fixed.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.