Closed Bug 1268434 Opened 4 years ago Closed 4 years ago

[GMP] crash in mozilla::InvokeAsync<T>

Categories

(Core :: Audio/Video: GMP, defect, P1, critical)

47 Branch
x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 + fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: philipp, Assigned: gerald)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-08330689-de62-4811-871d-b1bdb2160428.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::InvokeAsync<mozilla::MozPromise<bool, enum nsresult, 0>, mozilla::gmp::GeckoMediaPluginServiceParent, nsString, nsString&> 	xpcom/threads/MozPromise.h
1 	xul.dll 	mozilla::gmp::GeckoMediaPluginServiceParent::AsyncAddPluginDirectory(nsAString_internal const&) 	dom/media/gmp/GMPServiceParent.cpp
2 	xul.dll 	mozilla::gmp::GeckoMediaPluginServiceParent::Observe(nsISupports*, char const*, wchar_t const*) 	dom/media/gmp/GMPServiceParent.cpp
3 	xul.dll 	mozilla::gmp::GeckoMediaPluginServiceParent::AddPluginDirectory(nsAString_internal const&) 	dom/media/gmp/GMPServiceParent.cpp
4 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp
5 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp

in very early data for 47.0b1 this signature is at #20 with 0.8% of all crashes.
maybe it's related to the work from bug 1245789.
Anthony -- Since cpearce is on PTO, can you find someone to look at this?  This is related to EME, not OpenH264.  Thanks.
Rank: 10
Component: Audio/Video → Audio/Video: GMP
Flags: needinfo?(ajones)
Priority: -- → P1
Gerald is looking into another crash so it will have to wait until cpearce gets back.
Flags: needinfo?(ajones) → needinfo?(cpearce)
I'm baking tries for the other crash, so I can take a look at this one now.
Assignee: nobody → gsquelart
Flags: needinfo?(cpearce)
[Tracking Requested - why for this release]:
this is a recent crash regression so we shouldn't lose track of it...
It is not obvious by looking at the code how we guarantee GeckoMediaPluginService::GetAbstractGMPThread() (which returns mAbstractGMPThread) is sequenced after GeckoMediaPluginService::GetThread() (which initializes mAbstractGMPThread).
(In reply to JW Wang [:jwwang] from comment #5)
> It is not obvious by looking at the code how we guarantee
> GeckoMediaPluginService::GetAbstractGMPThread() (which returns
> mAbstractGMPThread) is sequenced after GeckoMediaPluginService::GetThread()
> (which initializes mAbstractGMPThread).

The only way (I can see) to get hold of a GeckoMediaPluginService is through GeckoMediaPluginService::GetGeckoMediaPluginService(), which constructs the parent or child side, and Init()'s it, which calls the first GetThread() that initializes mAbstractGMPThread.
This would have to be done before non-abstract methods like GetAbstractThread() can be called.

And if 'this' was null in the AddPluginDirectory() call (because GetGeckoMediaPluginService was not used), I would think we'd see crashes before the InvokeAsync() call, probably in 'RefPtr<AbstractThread> thread(GetAbstractGMPThread())' when reading mAbstractGMPThread or trying to AddRef its pointee.

But from the crash reports, it still kind of makes most sense that this (mAbstractThread being null) would be what is wrong.
If we cannot find the cause soon, I could start adding assertions/diagnostics to verify my assumptions above.


Chris, welcome back! Would you like to have a quick look?
Flags: needinfo?(cpearce)
After discussion with Chris, we've noticed that the thread pointer returned from GetAbstractThread() is not checked, even though the 'Get' prefix suggests it could very well be null. E.g., when shutdown happens.
Though it might not be this exact situation here, the checks should be done regardless, so I'll add that anyway, and we'll see if that solves this issue as well.
Flags: needinfo?(cpearce)
More truthfully: Chris first noticed the missing checks.
GeckoMediaPluginService::mAbstractThread was not reset as expected from
ShutdownGMPThread, meaning it would retain a reference to the GMP thread, and
it would allow dispatch attempts to the GMP thread after shutdown.

Also mAbstractThread was not protected by a mutex (as mGMPThread was), which is
definitely needed now that it can be reset at shutdown time.

As its prefix implies, GetAbstractThread could return a nullptr, so it should
be checked before every use.
Note that this GetAbstractThread call (and its check) has been moved closer to
the start of functions using it, to avoid unnecessary and potentially invariant-
breaking partial work to take place when we can know in advance that it won't
fully succeed because the GMP thread is not available.

Review commit: https://reviewboard.mozilla.org/r/51077/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51077/
Comment on attachment 8749642 [details]
MozReview Request: Bug 1268434 - Mutex-protect and check GMP abstract thread before uses - c?cpearce

https://reviewboard.mozilla.org/r/51077/#review47965
Reminder to uplift to 47.
Flags: needinfo?(gsquelart)
https://hg.mozilla.org/mozilla-central/rev/b1146a78d39f
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Rebase of already-r+'d patch, for Aurora&Beta uplift (if approved).

Approval Request Comment
[Feature/regressing bug #]: Widevine bug 1245789 and its uplift bug 1265270.

[User impact if declined]: ~30 startup crashes per day on 47+.

[Describe test coverage new/current, TreeHerder]: Media tests in aurora try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cafcf0151d5ef54143a8d151ebe98872ac6f38fb and beta try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=913e1bc4cab336d70eea0d8e2a6ec5f304f839b2

[Risks and why]: Low.
The patch is expanding the use of an existing mutex to cover a member that should have been protected anyway (I think the code was relying on that member being read-only, which was not quite 100% correct). Main risks: Deadlocks (should not happen as this existing mutex doesn't mingle with others) and re-locking (I've examined all paths using this mutex, and fixed the only path I could find where the mutex was re-acquired).
Other changes have very low risks: Making more null-checks before using that member, and actually resetting it when it should not be used anymore.

[String/UUID change made/needed]: None
Flags: needinfo?(gsquelart)
Attachment #8750508 - Flags: review+
Attachment #8750508 - Flags: approval-mozilla-beta?
Attachment #8750508 - Flags: approval-mozilla-aurora?
Comment on attachment 8750508 [details] [diff] [review]
1268434-aurora-beta.patch

This crash didn't occur on Nightly so there is no validation on that front but this is still worth uplifting. Aurora48+, Beta47+
Attachment #8750508 - Flags: approval-mozilla-beta?
Attachment #8750508 - Flags: approval-mozilla-beta+
Attachment #8750508 - Flags: approval-mozilla-aurora?
Attachment #8750508 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.