Closed Bug 983313 Opened 6 years ago Closed 6 years ago

Write crash events for plugin crashes and hangs

Categories

(Toolkit :: Crash Reporting, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla32
Tracking Status
firefox32 --- verified

People

(Reporter: gps, Assigned: adw)

References

(Blocks 1 open bug, )

Details

(Whiteboard: p=13 s=it-32c-31a-30b.2 [qa!])

Attachments

(2 files, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #875562 +++

In bug 875562 we implemented a crash event subsystem. Unfortunately, we didn't get around to plugin crashes and hangs. This bug tracks implementing them.

We'll also need to update the FHR reporting mechanism to report the new events.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Whiteboard: p=13 s=it-31c-30a-29b.2
Florin can you assign someone to verify this bug ? They should also probably verify bug 875562 as well. Thanks!
Flags: needinfo?(florin.mezei)
Whiteboard: p=13 s=it-31c-30a-29b.2 → p=13 s=it-31c-30a-29b.2 [qa+]
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Flags: needinfo?(florin.mezei)
QA Contact: andrei.vaida
QA Contact: andrei.vaida → paul.silaghi
Blocks: 994708
Whiteboard: p=13 s=it-31c-30a-29b.2 [qa+] → p=13 s=it-31c-30a-29b.3 [qa+]
Assignee: gps → nobody
Whiteboard: p=13 s=it-31c-30a-29b.3 [qa+] → p=13 [qa+]
Assignee: nobody → adw
Whiteboard: p=13 [qa+] → p=13 s=it-31c-30a-29b.3 [qa+]
Attached patch WIP patch (obsolete) — Splinter Review
Is this all there is to it?

It doesn't look like there's currently a way to distinguish plugin crashes from hangs.  They both result in a channel error, which ends up as an AbnormalShutdown in PluginModuleParent::ActorDestroy.  Is that right?  Am I supposed to add a way for PluginModuleParent to tell them apart?
Attachment #8410709 - Flags: feedback?(benjamin)
Comment on attachment 8410709 [details] [diff] [review]
WIP patch

>diff --git a/dom/ipc/CrashReporterParent.cpp b/dom/ipc/CrashReporterParent.cpp

>+void
>+CrashReporterParent::NotifyCrashService()
>+{
>+    nsCOMPtr<nsICrashService> crashService =
>+        do_GetService("@mozilla.org/crashservice;1");
>+    if (!crashService) {
>+        return;
>+    }
>+    crashService->AddPluginCrash(mChildDumpID);
>+}

The CrashReporterParent doesn't know whether this is a plugin crash or a content crash, which is important (or whether it's a plugin crash or hang). I think we should move this to the two toplevel actors: ContentParent and PluginModuleParent. In the ActorDestroy method, we know whether we have a crash dump or not: See 

http://hg.mozilla.org/mozilla-central/annotate/9d3da41ad0b6/dom/plugins/ipc/PluginModuleParent.cpp#l739
calls
http://hg.mozilla.org/mozilla-central/annotate/9d3da41ad0b6/dom/plugins/ipc/PluginModuleParent.cpp#l673

We should actually probably annotate a crash even if we weren't able to collect a dump, by writing an empty dump ID to the crash service.

Plugin hangs go through this codepath: http://hg.mozilla.org/mozilla-central/annotate/9d3da41ad0b6/dom/plugins/ipc/PluginModuleParent.cpp#l461

And content crashes go through here: http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#1274

The rest looks good to me! We'll also need a test, which I think you can piggyback onto the existing crashing-plugin tests at http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/mochitest/test_crashing.html?force=1 and http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/mochitest/test_hanging.html?force=1
Attachment #8410709 - Flags: feedback?(benjamin)
Whiteboard: p=13 s=it-31c-30a-29b.3 [qa+] → p=13 s=it-32c-31a-30b.1 [qa+]
Attached patch patch (obsolete) — Splinter Review
Thanks, Benjamin!

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> The CrashReporterParent doesn't know whether this is a plugin crash or a
> content crash, which is important (or whether it's a plugin crash or hang).

Let me run this by you, because it looks like CrashReporterParent does know about the crash through its mProcessType and mNotes.  So this patch keeps the callout to the CrashService in CrashReporterParent.

> We should actually probably annotate a crash even if we weren't able to
> collect a dump, by writing an empty dump ID to the crash service.

It turns out CrashManager stores crash records in a Map keyed on crash IDs... given that, it didn't seem very useful to store records with empty IDs, so I didn't.  It would probably not be too hard to store key collisions in a chain, so I could do that if you'd like.

This adds three tests, and they all pass for me.  I need to update the ContentManager test, too.  I'll post another patch for that.

There's some code duplication and tiny API explosion in both the nsICrashService and CrashSubmit.jsm that I'm wondering might be worth fixing with a single function like:

void addChildCrash(in AString id, in long childType, in long crashType);

where crashType would be CRASH or HANG.
Attachment #8410709 - Attachment is obsolete: true
Attachment #8416864 - Flags: review?(benjamin)
> There's some code duplication and tiny API explosion in both the
> nsICrashService and CrashSubmit.jsm that I'm wondering might be worth fixing
> with a single function like:
> 
> void addChildCrash(in AString id, in long childType, in long crashType);
> 
> where crashType would be CRASH or HANG.

I think this is a good idea.

> > We should actually probably annotate a crash even if we weren't able to
> > collect a dump, by writing an empty dump ID to the crash service.
> 
> It turns out CrashManager stores crash records in a Map keyed on crash
> IDs... given that, it didn't seem very useful to store records with empty
> IDs, so I didn't.

ok. We do want it long-term, but I guess we want to add something to the FHR dataformat itself. Could you please file this as a followup?
Comment on attachment 8416864 [details] [diff] [review]
patch

>+++ b/toolkit/components/crashes/nsICrashService.idl

>+[scriptable, uuid(a1845b69-28f4-43db-be6e-02eb15a45481)]
>+interface nsICrashService : nsISupports
>+{
>+  /**
>+   * Record the occurrence of a crash in a content process.
>+   *
>+   * @param id
>+   *        Crash ID. Likely a UUID.
>+   */
>+  void addContentCrash(in AString id);

As noted in the prior comment, please make this more generic instead of having the three separate methods.
Attachment #8416864 - Flags: review?(benjamin) → review+
Attached patch patch 2Splinter Review
Asking for review again since it didn't seem right to have a generic addCrash() but not genericize other methods and getters.

In addition to the three new end-to-end tests that were in the previous patch, this also includes additions and changes to toolkit/components/crashes/tests/xpcshell/test_crash_manager.js and test_crash_store.js, which I said earlier I'd include in a different patch.

Should those new end-to-end tests live in toolkit/crashreporter instead of dom/ipc and dom/plugins?

This also fixes bug 982836 since it adds CrashManager.addCrash().

(In reply to Gregory Szorc [:gps] (not actively checking bugmail) from comment #0)
> We'll also need to update the FHR reporting mechanism to report the new
> events.

Is it OK to handle that in a different bug?  I see services/healthreport/tests/xpcshell/test_provider_crashes.js, but I'm not sure how to update provider.jsm.
Attachment #8416864 - Attachment is obsolete: true
Attachment #8417798 - Flags: review?(benjamin)
We can handle the FHR changes in a separate bug if necessary. But that is the final end-state we care about, and it's probably as simple as adding the new field to http://mxr.mozilla.org/mozilla-central/source/services/healthreport/providers.jsm#1051
Comment on attachment 8417798 [details] [diff] [review]
patch 2

Putting the end-to-end tests where they are is fine.

As discussed on IRC, you can remove the crash.plugin.1 and hang.plugin.1 event types from _handleEventFilePayload since we're not writing event files for these.
Attachment #8417798 - Flags: review?(benjamin) → review+
Attached patch FHR patchSplinter Review
This is based on the previous patch.  Asking Richard for review for the mechanical FHR changes and Benjamin for review to make sure this captures what we want.

I stuck main-hang in there to be forward compatible even though we're not actually catching main process hangs I don't think.
Attachment #8418232 - Flags: review?(rnewman)
Attachment #8418232 - Flags: review?(benjamin)
Attachment #8418232 - Flags: review?(benjamin) → review+
Comment on attachment 8418232 [details] [diff] [review]
FHR patch

Review of attachment 8418232 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/healthreport/providers.jsm
@@ +1071,4 @@
>  
> +    let m = this.getMeasurement("crashes", 3);
> +    let fields =
> +      new Set(Object.keys(DailyCrashesMeasurement3.prototype.fields));

let fields = DailyCrashesMeasurement3.prototype.fields;
  
  ...
  
      if (!(type in fields)) {

seems like it would save on some garbage. We don't really get any value from being 'correct' and using a Set here.
Attachment #8418232 - Flags: review?(rnewman) → review+
Hopefully this is the last try run: https://tbpl.mozilla.org/?tree=Try&rev=de0a919a9f12

The new dom/ipc/tests/test_CrashService_crash.html was failing on try on e10s, Android, and b2g, but curiously the similar new tests in dom/plugins weren't.  Turns out tests in the latter don't run in a variety of cases, so I stopped the former from running in those cases, too.  I had to add a big run-if to dom/ipc/tests/mochitest.ini:

> +[test_CrashService_crash.html]
> +run-if = crashreporter && !e10s && (toolkit == 'gtk2' || toolkit == 'gtk3' || toolkit == 'cocoa' || toolkit == 'windows') && (buildapp != 'b2g' || toolkit == 'gonk')

!e10s because there's no point in running the test file itself OOP, and doing so doesn't actually work because it causes the CrashManager singleton to be instantiated from the child process, which means it can't correctly set up its directories.  The other two new tests in dom/plugins weren't failing because the tests in that directory are already not running on e10s.

The toolkit subexpression is equivalent to this: http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/moz.build?rev=6022ca67befb#11

The buildapp subexpression is copied from this: http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/mochitest/mochitest.ini?rev=d2c541609d89#2
Whiteboard: p=13 s=it-32c-31a-30b.1 [qa+] → p=13 s=it-32c-31a-30b.2 [qa+]
https://hg.mozilla.org/mozilla-central/rev/a6718c6dcdc9
https://hg.mozilla.org/mozilla-central/rev/576f07f4690b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
32.0a1 (2014-05-14)
On Windows:
- by killing FlashPlayerPlugin_13_0_0_214.exe*32 (and the plugin crash reporter looks like this http://i.imgur.com/O2KGiiC.png?1 ) I see the plugin crashes are counted:
         - only after Firefox is restarted
         - even if the crash report is not sent
         "org.mozilla.crashes.crashes": {
                  "_v": 3,
                  "plugin-crash": 7
- by killing plugin-container.exe (and the plugin crash reporter looks like this http://i.imgur.com/PhlBWRD.png?1 ) nothing is counted
Mac and Linux have only the plugin-container process and nothing gets counted here too.

Thoughts?
Flags: needinfo?(benjamin)
Paul, how are you killing the plugin-container?

As implemented in this bug, we are only counting crashes which trigger the crash reporter. This means that you need to use crashinject.exe <pid-of-plugin-container> to trigger the crash.

As mentioned in comment 4, we're going to deal with no-crashreporter crashes as a followup.
Flags: needinfo?(benjamin)
Blocks: 1010476
(In reply to Benjamin Smedberg  [:bsmedberg] from bug 983313 comment #5)
> ok. We do want it long-term, but I guess we want to add something to the FHR
> dataformat itself. Could you please file this as a followup?

bug 1010476

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #16)
> As mentioned in comment 4, we're going to deal with no-crashreporter crashes
> as a followup.

Did you mean no-dump crashes (and comment 5)?  CrashManager.jsm and everything else in toolkit/components/crashes is only built if MOZ_CRASHREPORTER.
Yes, I meant no-dump crashes.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #16)
> Paul, how are you killing the plugin-container?
I was just ending it from the Task Manager.

> you need to use crashinject.exe <pid-of-plugin-container> to trigger the crash.
I guess you meant crashfirefox.exe <pid-of-plugin-container>. That worked fine on Windows, thank you.
kill -n SIGABRT <pid-of-plugin-container> works fine on Linux too.
On Mac I still don't see anything counted, but I guess that's ok due to bug 1010476.
Status: RESOLVED → VERIFIED
Whiteboard: p=13 s=it-32c-31a-30b.2 [qa+] → p=13 s=it-32c-31a-30b.2 [qa!]
I expect mac plugin crashes to be counted as long as the UI asks you to submit the report. And I expect the UI to show that as long as you're killing it using SIGABRT.
Flags: needinfo?(paul.silaghi)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #20)
> the UI asks you to
> submit the report. And I expect the UI to show that as long as you're
> killing it using SIGABRT.
This part it's not happening when executing kill -n SIGABRT <pid-of-plugin-container>
Flags: needinfo?(paul.silaghi)
Thoughts due to comment 21 ?
Flags: needinfo?(benjamin)
That is a separate bug, filed as 1012912. It doesn't affect the data quality here in general.
Flags: needinfo?(benjamin)
You need to log in before you can comment on or make changes to this bug.