Closed Bug 1282776 Opened 4 years ago Closed 10 months ago

Some plugin-process crashes are not marked as plugin crashes (submitted-from-infobar related?)

Categories

(Core :: Plug-ins, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- fixed
firefox68 --- wontfix
firefox69 --- fixed
firefox70 --- fixed

People

(Reporter: benjamin, Assigned: gsvelto)

References

Details

(Keywords: qawanted, Whiteboard: [fce-active-legacy])

Attachments

(1 file)

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~AssertPluginThread&release_channel=nightly&release_channel=aurora&process_type=%21plugin&_facets=plugin_name&_facets=process_type&_facets=dom_ipc_enabled&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=dom_ipc_enabled#crash-reports

From what I can tell, all of these are plugin-process crashes. Most of them have e10s enabled, but not all, so I don't know whether this is e10s-related or not. 100% of them were submitted from the infobar, which seems suspicious to me. (Found as a result of looking at bug 1205979).

Jim, this seems unfortunate to me but I'm not sure what it's relative priority is. What do you think?
Flags: needinfo?(jmathies)
Looking through a batch of these:

1) everything is in plugin-container.exe (plugin crash)
2) uptime = 0

^ have to dig into plugin process code to see if we set this properly. I seriously doubt these crashes are happening in the first 0 seconds of running.

3) thread 0 stack traces predominately appear to be in
mozilla::plugins::PluginInstanceChild::Destroy()

^ have to do some python analysis to confirm this since currently I'm just manually looking these over.

4) every crash appears to be related to java
Ryan, any chance softvision could do a round of e10s testing with java plugins looking for crashy test cases?
Blocks: e10s-crashes
Flags: needinfo?(jmathies) → needinfo?(ryanvm)
Keywords: qawanted
Flags: needinfo?(jmathies)
Sure.
Flags: needinfo?(ryanvm) → needinfo?(mfunches)
ack: we have been testing Flash Plugin (Blocking/cp2) this week; I will update test workbook with this request.
Flags: needinfo?(mfunches)
Benjamin/Jim; We have completed testing for the Java Plugin with e10s enabled. 
We picked several sites that make use of the java applets. You can review the information at this link: 

https://docs.google.com/spreadsheets/d/1VNljSnOkciM471nWVI4dQBVF1RUf3ULh-jUU1CT-J-k/edit#gid=1865772785

Please "ni" me if you would like additional testing or confirmations done.
a) I have seen bug reports related to GoToWebinar, however we do not have accounts for that. If Moz has a test account, we will test.
b) We had no immediate crashes, however the Windows 10x64 applying Fx.32 did experience some crashes. The IDs are listed on the Java Notes tab.
c) I have created the following wiki where I will update information related to ongoing Plugin testing - https://wiki.mozilla.org/QA/Plugin_Compatibility_Testing
(In reply to Michelle Funches - QA from comment #5)
> Benjamin/Jim; We have completed testing for the Java Plugin with e10s
> enabled. 
> We picked several sites that make use of the java applets. You can review
> the information at this link: 
> 
> https://docs.google.com/spreadsheets/d/1VNljSnOkciM471nWVI4dQBVF1RUf3ULh-
> jUU1CT-J-k/edit#gid=1865772785
> 
> Please "ni" me if you would like additional testing or confirmations done.
> a) I have seen bug reports related to GoToWebinar, however we do not have
> accounts for that. If Moz has a test account, we will test.
> b) We had no immediate crashes, however the Windows 10x64 applying Fx.32 did
> experience some crashes. The IDs are listed on the Java Notes tab.
> c) I have created the following wiki where I will update information related
> to ongoing Plugin testing -
> https://wiki.mozilla.org/QA/Plugin_Compatibility_Testing

Thanks!
Flags: needinfo?(jmathies)
Gabriele is this possibly the same as the bug you fixed about plugin crashes?
Flags: needinfo?(gsvelto)
Priority: -- → P2
This doesn't look like bug 1284030 for a number of reasons: first of all it seems to affect versions both before bug 1284030 was introduced and after it was fixed. The uptime is 0 which also makes it unlikely since to trigger bug 1284030 you need to wait for a hang and then for an additional 5s and finally AssertPluginThread() is calling MOZ_RELEASE_ASSERT() which will generate a minidump upon failure so these do look like genuine assertions to me where the plugin code in npjp2.dll is calling NPN_ReleaseObject() on the wrong thread.
Flags: needinfo?(gsvelto)
The crashes themselves are an "expected" problem with the Java plugin. The question we need to solve (urgently) is why these are not being marked as plugin-process crashes.
Flags: needinfo?(gsvelto)
Alright, I'll have a look at the code that's populating those crash reports.


BTW looking through other reports I've noticed that that having a 0s uptime is not that uncommon:

https://crash-stats.mozilla.com/search/?uptime=0&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

So either it's true (even though it's strange) or we have a bug where we do not set the uptime correctly under certain conditions. Those conditions don't seem specific to this bug though.

Leaving the NI.
Whiteboard: [fce-active]
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
I've noticed that the reports are missing a bunch of fields, not just ProcessType: PluginFilename,
PluginName, PluginVersion and StartupTime are also empty.
OK, here's what all those fields have in common: they're all populated under CrashReporterParent::GenerateChildData()

http://hg.mozilla.org/mozilla-central/file/60b340e55efe/dom/ipc/CrashReporterParent.cpp#l115

There's not many failure modes that would lead to CrashReporterParent::GenerateChildData() not writing data out: I don't think it's possible that the method is not being called as the crash wouldn't be informed at all (it seems to live on the same path that leads to NotifyCrashReport()). It is possible that it's hitting some issue when writing to disk though: those fields are the last one to be written out to the .extra file so if something funny happens (e.g. out of space, file locked by some other process, etc...) they won't be written out successfully.

Unfortunately unless I have a way to reproduce the issue it's very hard to tell what's going wrong.
Flags: needinfo?(gsvelto)
Michelle, are any of those crashes (listed in Java Notes) 100% reproducible?
Flags: needinfo?(mfunches)
Actually, that perhaps does not matter. Those crashes don't match the pattern bsmedberg initially described.
Flags: needinfo?(mfunches)
It seems we have a culprit, see bug 1277582 comment 66.
See Also: → 1277582
Duplicate of this bug: 1318136
This has been fixed in bug 1277582.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1277582
hi, this is marked as duplicate of bug 1277582 which is marked as fixed on 50+. however there are still loads of plugin crashes coming through the infobar that end up in the process_type "browser", like these flash related crashes on 52.0a2:
https://crash-stats.mozilla.com/search/?submitted_from_infobar=__true__&signature=^npswf&product=Firefox&version=52.0a2&process_type=browser#facet-signature
Flags: needinfo?(gsvelto)
(In reply to [:philipp] from comment #18)
> hi, this is marked as duplicate of bug 1277582 which is marked as fixed on
> 50+. however there are still loads of plugin crashes coming through the
> infobar that end up in the process_type "browser", like these flash related
> crashes on 52.0a2:
> https://crash-stats.mozilla.com/search/
> ?submitted_from_infobar=__true__&signature=^npswf&product=Firefox&version=52.
> 0a2&process_type=browser#facet-signature

Then it's not a duplicate; it seemed very similar but apparently it's not :-/ Reopening.
Status: RESOLVED → REOPENED
Flags: needinfo?(gsvelto)
Resolution: DUPLICATE → ---
Whiteboard: [fce-active] → [fce-active-legacy]

I finally figured out what's the problem here, those are crashes happening so early in the process startup that the generated crash report is incomplete and doesn't have the process type. See bug 1566855 comment 31 for more details. The fix for bug 1566855 will fix this bug too.

This changes the way crash reports for child processes happening too early
during the child process' startup. Before bug 1547698 we wrote a partial
.extra file with those crashes that lacked the process type. The user would
not be notified of those crashes until she restarted Firefox and even when
submitted those crashes would be erroneously labeled as browser crashes.

After bug 1547698 we stopped writing .extra files entirely for those crashes
which left orphaned .dmp files among the pending crash reports.

This patch does three things to improve the situation:

  • It writes a partial .extra file so that the crashes are detected at the next
    startup. So the user is still not notified directly of these crashes but she
    can report them later.
  • It adds the process type to the .extra file so that the crash reporters are
    labelled correctly.
  • It fixes a leak in the pidToMinidump hash-map. Since the crashes were
    not finalized the ChildProcessData strucutre associated with them would
    never be fred.
Status: REOPENED → ASSIGNED
Blocks: 1571711

Note for the reviewer: I'm not fond of the huge amount of code duplication here hence I opened bug 1571711 to address it.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac41bb02e30b
Finalize crash reports for child process crashes happening too early r=froydnj
Status: ASSIGNED → RESOLVED
Closed: 3 years ago10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

I'd like to uplift this to beta since it's the fix for the root cause of bug 1566855 but I'll wait for new crash reports to flow in from nightly to be sure the problem is solved.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #26)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

The problem has always been present.

Keywords: regression

This is working like a charm. It already caught a couple of crashes that we would have missed before such as bug 1573731 and bug 1574571. Time for uplifting it.

Comment on attachment 9083330 [details]
Bug 1282776 - Finalize crash reports for child process crashes happening too early

Beta/Release Uplift Approval Request

  • User impact if declined: Some child process crashes might not be reported.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This modifies the way we handle crashes during child process early startup. As such the change touches all the different child processes we have. I consider it medium risk mostly because of that, the code change is not very small even though it's not large either. I believe it's still worth the risk because this allows us to catch crashes that were flying under the radar before. For example we wouldn't have noticed bug 1573731 and bug 1574571 in nightly w/o this change.
  • String changes made/needed: none
Attachment #9083330 - Flags: approval-mozilla-beta?

Comment on attachment 9083330 [details]
Bug 1282776 - Finalize crash reports for child process crashes happening too early

Allows us to catch some previously-missed crashes. Approved for 69.0b16. I guess we can wontfix ESR68 since this issue has basically existed since ever?

Attachment #9083330 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9083330 [details]
Bug 1282776 - Finalize crash reports for child process crashes happening too early

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Without uplifting this bug we won't have visibility into a number of crashes happening early during child process startup. If ESR68 is affected by those types of crashes (e.g. bug 1573731 and bug 1574571) we won't notice.
  • User impact if declined: Some child process crashes might not be reported.
  • Fix Landed on Version: 70
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): See comment 29. This will also require uplifting bug 1572565 to apply cleanly. Bug 1572565 is a small, simple patch that fixes an issue with sandbox broker process crash counts being stored under the wrong name in telemetry.
  • String or UUID changes made by this patch: none
Attachment #9083330 - Flags: approval-mozilla-esr68?

Comment on attachment 9083330 [details]
Bug 1282776 - Finalize crash reports for child process crashes happening too early

This is proving effective to help us catch crashes which were previously going unreported. Approved for 68.1esr.

Attachment #9083330 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
You need to log in before you can comment on or make changes to this bug.