Closed Bug 1164264 Opened 5 years ago Closed 5 years ago

[EME] Async shutdown hang in mozilla::gmp::GeckoMediaPluginService::Observe(...) - Don't wait if there are no plugins remaining

Categories

(Core :: Audio/Video, defect, P2, critical)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED
mozilla41
Tracking Status
firefox39 + fixed
firefox40 + verified
firefox41 + verified

People

(Reporter: cpearce, Assigned: gerald)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-11033bfc-c96f-4813-8468-83b312150507.
=============================================================

We're hanging in this loop in mozilla::gmp::GeckoMediaPluginService::Observe(...):

    while (mWaitingForPluginsAsyncShutdown) {
      NS_ProcessNextEvent(NS_GetCurrentThread(), true);
    }

We should really just use nsIAsyncShutdown here.
It would be nice to have a consistent behavior for all plugins to employ async shutdown even when it has nothing to do. And we can remove GMPAsyncShutdown/GMPAsyncShutdownHost.
Note: Bug 1057908 (part 1) overhauled GMPService and moved the offending code to GMPServiceParent.cpp.
Test patch, not for release: Simulate async shutdown hang in gmp-clearkey, by never notifying 'ShutdownComplete' when asked to begin shutdown.

Tried with clearkey test video on Mac; I can see the hang for 3 seconds, then the parent aborts shutdown and just closes everything as expected, and allows Firefox to exit; I.e., I could not reproduce the issue.
In case there are no plugins, no shutdown activity will take place and therefore mWaitingForPluginsAsyncShutdown will never be toggled back to true, hence the never-ending 'while(mWaitingForPluginsAsyncShutdown)'.
This would explain why the shutdown timeout in GMPParent didn't fire as would have been expected - it was never started in the first place!

This patch simply sets mWaitingForPluginsAsyncShutdown to false if there are no plugins, skipping the following blocking while-loop.

(The while-loop itself could have been moved inside the if-then block, but I wanted to keep the patch as simple as possible, for easier uplift if needed.)


If shutdown-hangs still happen after this, the next possible step to try would be to add a timeout in GMPService itself, instead of trusting GMPParent's to timeout by themselves.
Assignee: nobody → gsquelart
Attachment #8606840 - Attachment is obsolete: true
Attachment #8608539 - Flags: review?(cpearce)
Attachment #8608539 - Flags: review?(cpearce) → review+
We should uplift this.
Flags: needinfo?(cpearce)
https://hg.mozilla.org/mozilla-central/rev/555d78bff49c
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment on attachment 8608539 [details] [diff] [review]
1164264-GMPService-shutdown-no-plugins-left.patch

Approval Request Comment
[Feature/regressing bug #]: EME
[User impact if declined]: We've seen some crash reports that are induced due to shutdown hangs where Firefox is waiting for Gecko Media Plugins to shutdown. We think this patch may fix that shutdown hang. To user impact if declined is a known shutdown hang will not go away.
[Describe test coverage new/current, TreeHerder]: Been on m-c for a few days.
[Risks and why]: Low; makes us not wait for plugins to shutdown if there are no plugins...
[String/UUID change made/needed]: None.
Attachment #8608539 - Flags: approval-mozilla-beta?
Attachment #8608539 - Flags: approval-mozilla-aurora?
Comment on attachment 8608539 [details] [diff] [review]
1164264-GMPService-shutdown-no-plugins-left.patch

This meets the Beta uplift bar as shutdown hangs are critical to the release quality. Also adding approval for Aurora uplift. Enabling QE-Verify flag.
Attachment #8608539 - Flags: approval-mozilla-beta?
Attachment #8608539 - Flags: approval-mozilla-beta+
Attachment #8608539 - Flags: approval-mozilla-aurora?
Attachment #8608539 - Flags: approval-mozilla-aurora+
Really enabling QE-Verify flag.
Flags: qe-verify+
Adding a tracking flag for FF39, FF40 and FF41.
Flags: needinfo?(cpearce)
Couldn't reproduce with Nightly from 2015-05-15 under Windows 7 64-bit nor Mac OS X 10.9.5 with http://people.mozilla.org/~cpearce/mse-clearkey/ test video - no hang/crash encountered.

Although, in Socorro there are 7 reports for 39.0b3 version with this signature: 
https://crash-stats.mozilla.com/report/list?signature=shutdownhang%20%7C%20WaitForSingleObjectEx%20%7C%20WaitForSingleObject%20%7C%20PR_Wait%20%7C%20mozilla%3A%3AReentrantMonitor%3A%3AWait(unsigned%20int)%20%7C%20nsThread%3A%3AProcessNextEvent(bool%2C%20bool*)%20%7C%20NS_ProcessNextEvent(nsIThread*%2C%20bool)%20%7C%20mozilla%3A%3Agmp%3A%3AGeckoMediaPluginService%3A%3AObserve(nsIS...#tab-reports

Gerald, any ideas about those reports? Also, could you please point me to the test video mentioned in comment 3? Thanks in advance!
Flags: needinfo?(gsquelart)
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #15)
> Couldn't reproduce with Nightly from 2015-05-15 under Windows 7 64-bit nor
> Mac OS X 10.9.5 with http://people.mozilla.org/~cpearce/mse-clearkey/ test
> video - no hang/crash encountered.
> 
> Although, in Socorro there are 7 reports for 39.0b3 version with this
> signature: 
> https://crash-stats.mozilla.com/report/
> list?signature=shutdownhang%20%7C%20WaitForSingleObjectEx%20%7C%20WaitForSing
> leObject%20%7C%20PR_Wait%20%7C%20mozilla%3A%3AReentrantMonitor%3A%3AWait(unsi
> gned%20int)%20%7C%20nsThread%3A%3AProcessNextEvent(bool%2C%20bool*)%20%7C%20N
> S_ProcessNextEvent(nsIThread*%2C%20bool)%20%7C%20mozilla%3A%3Agmp%3A%3AGeckoM
> ediaPluginService%3A%3AObserve(nsIS...#tab-reports
> 
> Gerald, any ideas about those reports? Also, could you please point me to
> the test video mentioned in comment 3? Thanks in advance!

I'm using the same test video as the one you've mentioned above.

I believe my patch here was necessary, as it fixed a real potential cause of hangs.
Now that it has been applied, and we are still getting reports from the same location (waiting for GMPs to shutdown), at least we know the problem is not that we are wrongly waiting for nothing!

Unfortunately I cannot see an obvious reason for the hangs, as it may be happening in the plugin itself, whose information is not included in these reports.

The strange thing is that there are already timeouts around GMP handling, which should prevent these long hangs. As per my comment 4, my next recommended step would be to add a timeout in GMPService itself and make it crash the plugin so that hopefully we can get more information that way.

Chris, thoughts?
Should we open a separate bug to continue working on these reports?
Flags: needinfo?(gsquelart) → needinfo?(cpearce)
(After discussion with Chris)
Keeping this bug 'fixed' to track its patch.
Opened separate bug 1173634 to try other potential fixes, or at least try to get more information from reports.
Flags: needinfo?(cpearce)
Summary: [EME] Async shutdown hang in mozilla::gmp::GeckoMediaPluginService::Observe(...) → [EME] Async shutdown hang in mozilla::gmp::GeckoMediaPluginService::Observe(...) - Don't wait if there are no plugins remaining
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #18)
> No new crash reports for 40 and 41 in the last 2 weeks [1]. Marking
> accordingly.
> 
> [1]
> https://crash-stats.mozilla.com/report/
> list?range_unit=days&range_value=14&signature=shutdownhang+|+WaitForSingleObj
> ectEx+|+WaitForSingleObject+|+PR_Wait+|+mozilla%3A%3AReentrantMonitor%3A%3AWa
> it%28unsigned+int%29+|+nsThread%3A%3AProcessNextEvent%28bool%2C+bool*%29+|+NS
> _ProcessNextEvent%28nsIThread*%2C+bool%29+|+mozilla%3A%3Agmp%3A%3AGeckoMediaP
> luginService%3A%3AObserve%28nsIS...#tab-reports

Unfortunately, I believe the issue is still there, but due to code changes the signature has changed.
See bug 1183971 for crashes in 39 and bug 1183972 in 40.
You need to log in before you can comment on or make changes to this bug.