Closed Bug 1263951 Opened 4 years ago Closed 4 years ago

crash in mozilla::ipc::TransferHandleToProcess (GMP related)

Categories

(Core :: Audio/Video: GMP, defect, P1, critical)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 --- fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: jimm, Assigned: jimm)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1258317 +++

This signature breaks down into two sub groups, this bug tracks GMP related issues. In particular 1.a, which should be pretty easy to track down.

1) Parent side crashes during GMP init on a background thread, made up of two sub groups: 

a) 55 (18%) Crashes that happen during a shutdown race where the main thread is tearing down the GMP child process at the same time the child is initializing.

b) 38 (12%) A generic TransferHandleToProcess, mostly associated with GMP.

2) 244 (~66%) Failures to transfer a handle in the child, mostly during the init of PBackground::Open. (We don't have GetLastError data for these yet.)
Maire, is this something your team might be able to pick up?
Flags: needinfo?(mreavy)
Summary: crash in mozilla::ipc::TransferHandleToProcess → crash in mozilla::ipc::TransferHandleToProcess (GMP related)
Chris -- Can you (or someone you designate) take a look at this and identify where the bug is and what needs to get fixed?  The fact that we are init'ing GMP during Shutdown sounds problematic.  Thanks.
Flags: needinfo?(mreavy) → needinfo?(cpearce)
jimm: How did you conclude that the 1a crashes here were caused by shutdown? For example, this parent crash: https://crash-stats.mozilla.com/report/index/599a2ee5-43ff-49ae-b3fb-dfa452160416#allthreads doesn't have anything on the main thread call stack to indicate that shutdown has begun.
Flags: needinfo?(jmathies)
(In reply to Chris Pearce (:cpearce) from comment #3)
> jimm: How did you conclude that the 1a crashes here were caused by shutdown?
> For example, this parent crash:
> https://crash-stats.mozilla.com/report/index/599a2ee5-43ff-49ae-b3fb-
> dfa452160416#allthreads doesn't have anything on the main thread call stack
> to indicate that shutdown has begun.

I was looking at crashes that have ShutdownProgress set.

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~TransferHandleToProcess&_facets=signature&_facets=shutdown_progress&_facets=system_error&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

In this search I see 87 crashes, of which 26 have something set in ShutdownProgress (profile-change-teardown), most (95%?) have a crashing background thread initializing GMP.

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~TransferHandleToProcess&shutdown_progress=change&_facets=signature&_facets=shutdown_progress&_facets=system_error&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Flags: needinfo?(jmathies)
Sounds like GMP shutdown can take a long time, which might explain how we can get caught trying to init gmp during shut down.

http://mxr.mozilla.org/mozilla-central/source/dom/media/gmp/GMPServiceParent.cpp#363

Should we check if we're shutting down in GMPServiceParent::RecvLoadGMP through mService? That points to GeckoMediaPluginServiceParent which has a variable mWaitingForPluginsSyncShutdown that tracks observer shutdown notifications. Then we just need to be sure GMP code handles a failure in RecvLoadGMP gracefully.

This might not catch all these failures, but it should help substantially assuming that shutdown execution in the observer takes "a long time".
Attached patch wip (obsolete) — Splinter Review
This needs to be tested.
Attachment #8742892 - Flags: feedback?(cpearce)
Flags: needinfo?(cpearce)
Comment on attachment 8742892 [details] [diff] [review]
wip

Review of attachment 8742892 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable to me.
Attachment #8742892 - Flags: feedback?(cpearce) → feedback+
Comment on attachment 8742892 [details] [diff] [review]
wip

ok :), can I get an r+ on it?
Attachment #8742892 - Flags: review?(cpearce)
Assignee: nobody → jmathies
Blocks: e10s-crashes
No longer depends on: 1258317
Comment on attachment 8742892 [details] [diff] [review]
wip

Review of attachment 8742892 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/gmp/GMPServiceParent.cpp
@@ +690,5 @@
> +{
> +  MOZ_ASSERT(NS_GetCurrentThread() == mGMPThread);
> +  return mShuttingDownOnGMPThread;
> +}
> +

Nit: Extra line break here.

::: layout/generic/test/mochitest.ini
@@ +115,5 @@
>  [test_overflow_event.html]
>  [test_page_scroll_with_fixed_pos.html]
>  skip-if = buildapp == 'mulet' || (buildapp == 'b2g' && (toolkit != 'gonk' || debug)) # b2g-debug(opened window too small?) b2g-desktop(opened window too small?)
>  support-files = page_scroll_with_fixed_pos_window.html
>  [test_plugin_clipping.xhtml]

These tests look like they're NPAPI plugin tests, and not GMP related. Seems to me that the change to re-enable these tests should be in a different bug, since it seems unlikely (or at lest unbelievable) that the change you're making would affect these tests.
Attachment #8742892 - Flags: review?(cpearce) → review+
> ::: layout/generic/test/mochitest.ini
> @@ +115,5 @@
> >  [test_overflow_event.html]
> >  [test_page_scroll_with_fixed_pos.html]
> >  skip-if = buildapp == 'mulet' || (buildapp == 'b2g' && (toolkit != 'gonk' || debug)) # b2g-debug(opened window too small?) b2g-desktop(opened window too small?)
> >  support-files = page_scroll_with_fixed_pos_window.html
> >  [test_plugin_clipping.xhtml]
> 
> These tests look like they're NPAPI plugin tests, and not GMP related. Seems
> to me that the change to re-enable these tests should be in a different bug,
> since it seems unlikely (or at lest unbelievable) that the change you're
> making would affect these tests.

Sorry about that, this leaked through from some other e10s work in my queue.
Attached patch patchSplinter Review
Attachment #8742892 - Attachment is obsolete: true
Attachment #8745431 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/826dedbdebdc
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Target Milestone: mozilla48 → mozilla49
Depends on: 1268714
See Also: → 1258317
Comment on attachment 8745431 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]:
e10s related crash fix (need to include the follow up in bug 126874)
[User impact if declined]:
more crashy e10s
[Describe test coverage new/current, TreeHerder]:
it's been on nightly now since 4-27
[Risks and why]: 
modest
[String/UUID change made/needed]:
none
Attachment #8745431 - Flags: approval-mozilla-aurora?
Marking affected for 47 and 48 based on crash-stats
Comment on attachment 8745431 [details] [diff] [review]
patch

Less crashy e10s sounds good, please uplift to aurora.
Attachment #8745431 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Note, followup issue is in bug 1268714
shoot, sorry. I was thinking of another bug similar to this one.
See Also: → 1271601
Hi Jim, do we need this on Beta47? Just wondering.
Flags: needinfo?(jmathies)
Comment on attachment 8745431 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]:
e10s related crash fix (need to include the follow up in bug 126874)
[User impact if declined]:
more crashy e10s
[Describe test coverage new/current, TreeHerder]:
it's been on nightly and aurora for a while
[Risks and why]: 
low
[String/UUID change made/needed]:
none
Flags: needinfo?(jmathies)
Attachment #8745431 - Flags: approval-mozilla-beta?
Comment on attachment 8745431 [details] [diff] [review]
patch

I looked at the data on Nightly and Aurora and we seem to be hitting this crash still. I didn't dig into the data more to see if the trends shows a reduction in crash volume since the fix. Still seems reasonable to uplift, Beta47+
Attachment #8745431 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Ritu Kothari (:ritu) from comment #23)
> Comment on attachment 8745431 [details] [diff] [review]
> patch
> 
> I looked at the data on Nightly and Aurora and we seem to be hitting this
> crash still. I didn't dig into the data more to see if the trends shows a
> reduction in crash volume since the fix. Still seems reasonable to uplift,
> Beta47+

This is a two parter, bug 1258317 covers another class of crashes with the same signature. I checked nightly and found one crash related to gmp that snuck through before we entered shutdown. This was a substantial improvement compared to prior to this work landing. Hence why I marked it for uplift. bug 1271601 also help with any remaining issues, although we're not blocking on that work.
has problems uplifting to beta:

grafting 343679:61a1435b8d61 "Bug 1263951 - Avoid trying to initialize new GMP instances once the browser enters shutdown. r=cpearce a=lizzard"
merging dom/media/gmp/GMPServiceParent.cpp
merging dom/media/gmp/GMPServiceParent.h
warning: conflicts while merging dom/media/gmp/GMPServiceParent.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(jmathies)
Attached patch beta patchSplinter Review
fixes a minor merge conflict, nothing major.
Flags: needinfo?(jmathies)
You need to log in before you can comment on or make changes to this bug.