Closed Bug 1313258 Opened 8 years ago Closed 7 years ago

Intermittent FATAL ERROR: AsyncShutdown timeout in xpcom-will-shutdown Conditions: [{"name":"MediaShutdownManager: shutdown","state":"(none)","filename":"/builds/slave/autoland-m64-d-000000000000000/build/src/dom/media/MediaShutdownManager.cpp","lineNumbe

Categories

(Core :: Audio/Video: GMP, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- wontfix
firefox-esr52 --- affected
firefox53 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: padenot)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Maybe a GMP shutdown hang?

 10:22:39     INFO - WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"GeckoMediaPluginServiceParent: shutdown","state":"(none)","filename":"/builds/slave/autoland-m64-d-000000000000000/build/src/dom/media/gmp/GMPServiceParent.cpp","lineNumber":1884,"stack":"GeckoMediaPluginServiceParent shutdown"}] Barrier: xpcom-will-shutdown
Component: Audio/Video → Audio/Video: GMP
Rank: 35
Priority: -- → P3
This failure appears to be closely tied to the spike in bug 1124970, which recently had its needinfo cleared for reasons I won't claim to understand.
This is bug 1317234 also I think?
Flags: needinfo?(achronop)
I am afraid it is not the same. It looks like a deadlock but none of the threads are blocked in the same point as in Bug 1317234. From a quick pass of the back traces I see that most of the treads are blocked here:

libnss3.dylib!PR_WaitCondVar [ptsynch.c:ef9b2a0ba51b : 396 + 0x8]

I hope that helps.
Flags: needinfo?(achronop)
In today's channel meeting, it was mentioned that we're seeing MSG-related shutdown hangs in the wild across all platforms. I wonder if this is the same root problem?
https://wiki.mozilla.org/Firefox/Channels/Meetings/2017-01-10#Release
Fixed on trunk by bug 1331869. Still hitting on Beta, though :(
Status: NEW → RESOLVED
Closed: 7 years ago
Depends on: 1331869
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Paul, have you had a chance to look into what might have fixed this for Beta?
Flags: needinfo?(padenot)
I had a closer look. This is a different issue related to a GMP process still blocking the shutdown. Easy to put in the same bag, since most of the error message is the same:

20:49:53     INFO - FATAL ERROR: AsyncShutdown timeout in xpcom-will-shutdown Conditions: [{"name":"GeckoMediaPluginServiceParent: shutdown","state":"(none)","filename":"/builds/slave/m-beta-m64-d-00000000000000000/build/src/dom/media/gmp/GMPServiceParent.cpp","lineNumber":1898,"stack":"GeckoMediaPluginServiceParent shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.

Anthony, do you have someone to look at this or an idea ? This is beta-only.
Flags: needinfo?(padenot) → needinfo?(ajones)
Depends on: 1342117
No longer depends on: 1342117
Looks like we're too late for 52.
Flags: needinfo?(ajones)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #24)
> Looks like we're too late for 52.

As much as I can appreciate "defer until it's not an issue" as a winning strategy, we're still stuck with this on ESR52 for the next year if nothing is done.
Flags: needinfo?(ajones)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #25)
> As much as I can appreciate "defer until it's not an issue" as a winning
> strategy, we're still stuck with this on ESR52 for the next year if nothing
> is done.

You know you're busy when you don't even have enough time to delegate.

Chris - this looks like it is right up your alley.
Flags: needinfo?(ajones) → needinfo?(cpearce)
(In reply to Dan Minor [:dminor] from comment #1)
> Maybe a GMP shutdown hang?
> 
>  10:22:39     INFO - WARNING: At least one completion condition is taking
> too long to complete. Conditions: [{"name":"GeckoMediaPluginServiceParent:
> shutdown","state":"(none)","filename":"/builds/slave/autoland-m64-d-
> 000000000000000/build/src/dom/media/gmp/GMPServiceParent.cpp","lineNumber":
> 1884,"stack":"GeckoMediaPluginServiceParent shutdown"}] Barrier:
> xpcom-will-shutdown

I'm seeing this a lot with the ChromiumCDM protocol enabled when running  dom/media/test/test_eme_canvas_blocked.html  only.
I can repro on dom/media/test/test_eme_stream_capture* as well.

The cause of the GMP shutdown hang is that the GMP service in the parent process is waiting in an early shutdown phase for the GMP service in the child process to shutdown, and the GMP service in the child process will only shutdown in the final shutdown phase.

The GeckoMediaPluginServiceParent maintains an nsIAsyncShutdown blocker while it's alive; while its refcount is > 0. When The GeckoMediaPluginServiceParent is destroyed, it releases the shutdown blocker. That's an "xpcom-will-shutdown" blocker. As we'll see, it's impossible for the GMP service to be released in "xpcom-will-shutdown" phase.

For every GMPServiceParent<->GMPServiceChild IPC connection the GMPServiceParent keeps a refptr to the GeckoMediaPluginServiceParent in the main process. So the GMPServiceParent stays alive as long as the connection to the GMPServiceChild in the content process is alive.

The GMP service in the child process is kept alive by a StaticRefPtr singleton (sSingletonService). The refptr is cleared on shutdown by a ClearOnShutdown() with the default phase of ShutdownFinal:
https://dxr.mozilla.org/mozilla-central/rev/272ce6c2572164f5f6a9fba2a980ba9ccf50770c/dom/media/gmp/GMPService.cpp#123

So that means the parent process waits in an "xpcom-will-shutdown" blocker for a shutdown operation in the content process that will happen in in a ClearOnShutdown/ShutdownFinal. That happens here:

https://dxr.mozilla.org/mozilla-central/rev/272ce6c2572164f5f6a9fba2a980ba9ccf50770c/xpcom/build/XPCOMInit.cpp#939

ClearOnShutdown/ShutdownFinal happens after the shutdown-threads directive:

https://dxr.mozilla.org/mozilla-central/rev/272ce6c2572164f5f6a9fba2a980ba9ccf50770c/xpcom/build/XPCOMInit.cpp#898

So the GMP service parent's shutdown in the parent process is waiting for something a shutdown operation in a later stage in the shutdown process to happen the child process.

Also, the parent process uses the same mechanism to keep its reference to the GMP service singleton in the parent process. So I expect we'll not see the GMP service deleted in the parent process anyway until the final shutdown phase.

The MSG is also not shutting down cleanly. I don't understand enough about the MSG to make sense of that yet... Maybe there's something about the GMP shutdown blocking that means MSG doesn't complete shutdown?

Additionally, nsIAsyncShutdown is implemented in JS (so presumably it can't be used very late in the shutdown procedure?), and in the content process only works for "web-workers-shutdown" and "xpcom-will-shutdown".

https://dxr.mozilla.org/mozilla-central/rev/272ce6c2572164f5f6a9fba2a980ba9ccf50770c/toolkit/components/asyncshutdown/AsyncShutdown.jsm#1017

We do need to keep the GMP thread alive long enough for everything that uses it to shutdown GMP, but we need it to be able to shutdown the GMPServiceChild/Parent sooner than the final shutdown phase.
Assignee: nobody → cpearce
Flags: needinfo?(cpearce)
I'm fixing the issue with GMPService failing to unblock shutdown in bug 1352924. The MSG failing to shutdown is a separate issue and will need another fix.
Paul: can you (or someone else) take another look at the MSG failing to async shutdown? I can repro reliably on Windows with `./mach mochitest dom/media/test/test_eme_streams_capture*`. I believe I've fixed the GMP service's problem with async shutdown.
Flags: needinfo?(padenot)
Thanks Chris, I'll take it from there.
Flags: needinfo?(padenot)
Assignee: cpearce → padenot
Comment on attachment 8876188 [details]
Bug 1313258 - Don't create a MediaStreamGraph without doing anything with it.

https://reviewboard.mozilla.org/r/147622/#review151996
Attachment #8876188 - Flags: review?(rjesup) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: