Closed Bug 1135562 Opened 10 years ago Closed 9 months ago

shutdownhang in MediaShutdownManager::Shutdown() with threads in MediaDecoderStateMachine::StopAudioThread() and wasapi_stream_init()

Categories

(Core :: Audio/Video: Playback, defect, P3)

36 Branch
x86
Windows NT
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox37 + wontfix
firefox38 + wontfix
firefox39 + wontfix
firefox40 + wontfix
firefox41 + wontfix
firefox42 + wontfix
firefox44 --- wontfix
firefox45 - affected
firefox-esr38 --- affected

People

(Reporter: karlt, Assigned: kinetik)

References

Details

(Keywords: crash, hang, Whiteboard: qa-not-actionable)

Crash Data

+++ This bug was initially created as a clone of Bug #1129455 +++ https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219 Roughly half the shutdown hangs in MediaShutdownManager::Shutdown() in 36.0b10 (20150217132925) and in beta channel build id 20150219174507 have one thread in MediaDecoderStateMachine::StopAudioThread() and one or more threads in wasapi_stream_init(). As an extreme example, https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219 has 10 threads in wasapi_stream_init(). Some links from bug 1129455 Beta: https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=beta&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id Aurora: https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=aurora&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id Overall stats https://crash-stats.mozilla.com/report/list?signature=shutdownhang%20%7C%20WaitForSingleObjectEx%20%7C%20WaitForSingleObject%20%7C%20PR_Wait%20%7C%20nsThread%3A%3AProcessNextEvent%28bool%2C%20bool%2A%29%20%7C%20NS_ProcessNextEvent%28nsIThread%2A%2C%20bool%29%20%7C%20mozilla%3A%3AMediaShutdownManager%3A%3AShutdown%28%29+++ This bug was initially created as a clone of Bug #1129455 +++
https://bugzilla.mozilla.org/show_bug.cgi?id=1129455#c6 implies that this hang is not as frequent as earlier betas, but that may be because YouTube is now using Flash instead of HTML5 video.
Blocks: MSE
These changes were backed out of beta and aurora, per bug 1133190 comment 40 and comment 42.
(In reply to Matthew Gregan [:kinetik] from comment #2) > These changes were backed out of beta and aurora, per bug 1133190 comment 40 > and comment 42. Thanks. The shutdown hang in https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219 for example was in build (36.0b10) that includes the backout of 82339d98aa6a.
Hi Paul. Do you have any ideas what might cause threads to wait in AlpcSendWaitReceivePort()? Are any of the changes that have landed on Nightly likely to help here?
Flags: needinfo?(padenot)
Sorry, so it does; I didn't pay close enough attention to the backout vs build dates. Looking at the near-identical stacks of the ten threads inside wasapi_stream_init makes me wonder about bug 1103824 (despite the crash report originating from a Win 7 machine and that note being about Win 8).
(In reply to Karl Tomlinson (:karlt) from comment #4) > Hi Paul. Do you have any ideas what might cause threads to wait in > AlpcSendWaitReceivePort()? Not really. I just tried a bunch of stuff: - Reading the disassembled code check if `IAudioClient::Initialize` calls `AudioServerInitialize` only the first time or each time: it's called every time, so this might not be the STA vs. MTA first stream creation issue problem (but what do I know?) - Trying to enable/disable audio devices while playing. Two behaviours, here: - if `cubeb_init` has not been called, it goes in `cubeb_init` and no audio device is available so `wasapi_init` fails, cubeb tries to use winmm, which does not check if there is an audio device, and it appears to deadlock between the winmm thread, some thread trying to shut the stream down, and the MediaShutdownManager on the main thread (minidump: http://paul.cx/public/winmm-shutdown.dmp, it's a bit big to attach here. This is for revision 368c62292249). - if cubeb_init was already called, and it chose WASAPI, then future playbacks just fail. - Killing audiodg.exe while a stream is running, it works fine as well, sound stops, everything cleans up fine after CC - Killing audiodg.exe while in the middle of setting up the stream (single-stepping in assembly mode), right at the stack trace Karl mentions I have to say I'm quite confused about this one. I tried searching the web for similar problems and stacks, and I see gecko crash reports with kind of the same stacks, from before I implemented the WASAPI backend (I think WinMM is implemented with WASAPI things underneath these days, which would explain that), and this [0], which seems to be the same issue (albeit for a capture device, not a playback device). [0]: https://social.msdn.microsoft.com/Forums/windowsapps/en-US/05f1693d-4174-4d1e-b6f9-4acaac3cc05d/wasapi-iaudioclientinitialize-hanging-when-opening-audio-capture-device?forum=tailoringappsfordevices > Are any of the changes that have landed on Nightly likely to help here? I don't think so, those patches are not really changing the way we open streams.
Flags: needinfo?(padenot)
Sheila, do you think this is still a p1 blocker for MSE? If so, can you help us find an owner to fix this by Thursday when we go to build for beta 7? Thanks!
Flags: needinfo?(smooney)
I reviewed this with the media team today. While we'd like a fix and will continue to pursue one, this is not a blocker for 37. I'm marking 37 as wontfix.
These signatures don't look like a top crash for 37 - there are a few hundred for the last 7 days.
Keywords: crash
Anthony, do you think we should expect for a fix for this bug in 38? Thanks
Flags: needinfo?(ajones)
The patch in Bug 1155432 may help with some of the hangs, but not with the ones related to problems with the audio playback stalling.
I filed Socorro bug 1156172 to try to get some better data on the severity and possible correlations for this particular hang.
Depends on: 1156172
Too late for 38. Any news for 39?
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
(In reply to Paul Adenot (:padenot) from comment #6) > - Killing audiodg.exe while in the middle of setting up the stream > (single-stepping in assembly mode), right at the stack trace Karl mentions It's possible that the IAudioStreamVolume leak that was fixed and then refixed (bug 1134977) was causing audiodg.exe to exhaust address space and crash, exacerbating this issue. That was fixed back at the start of April, so the hang stats should reflect an improvement after that if it's the issue. Without much else to go on, I'm preparing a patch that follows the COM rules mentioned in the IAudioClient documentation on MSDN. It may or may not help with this hang, but at the least removing it as a possible contributor is worthwhile.
I'm fixing the COM issue in the bug specific to that (bug 1103824) since I am no more certain than "hopeful" that it'll help with these hangs.
Depends on: 1103824
This stack is a bit different, with many threads in MediaDecoderStateMachine::StopAudioThread() but only one visibly in wasapi_stream_destroy(). https://crash-stats.mozilla.com/report/index/436c2009-fc1d-4ea5-b231-7d7f62150516#allthreads
Matthew, is this crash still happening and do you think it's affecting 40 and 41? Which signature or stack represents this issue now ? I'm asking because I'm tracking the bug for 39, and not sure where you might want to uplift a fix, or how we can verify that it's fixed.
Flags: needinfo?(kinetik)
(In reply to Liz Henry (:lizzard) from comment #17) > Matthew, is this crash still happening and do you think it's affecting 40 > and 41? Yes, and it will be. The patch in bug 1103824 bounced and I haven't solved the remaining issue yet. > Which signature or stack represents this issue now ? I'm asking because I'm > tracking the bug for 39, and not sure where you might want to uplift a fix, > or how we can verify that it's fixed. Since we don't have a way to reproduce or even an idea what the real cause is, the only indicator that there has been a fix or improvement will be a reduction in hangs in wasapi_stream_init, but unfortunately it doesn't show up as that in the crash/hang reporter, but as the generic hang under MediaShutdownManager::Shutdown.
Flags: needinfo?(kinetik)
OK, marking wontfix for 39. We will cross our fingers hopefully as you all continue to unknot the other shutdown hangs. Thanks for the extra details Matthew!
Flags: needinfo?(smooney)
Flags: needinfo?(ajones)
The summary of the bug in the Chromium bug tracker I just added to "see also" is that they're hitting the same IAudioClient::Initialize hang in some cases, took a few shots at fixing it (including a similar COM initialization change to bug 1103824) and haven't had a great deal of luck finding a solution yet either.
Dale from Chromium here, found your bug via search. Yeah, we have no ideas about this :( Matthew van Eerde from Microsoft ( http://blogs.msdn.com/b/matthew_van_eerde/ ) has requested the following: "If you hit a hang in audioses.dll calling into rpcrt4.dll, can you take a dump of the svchost.exe process containing the AudioSrv service? I'm happy to investigate from there." https://code.google.com/p/chromium/issues/detail?id=422522#c150 I'll update here if we find anything else out.
Matthew, this crash signature is showing up with 40.0.2 (~3000 crashes) and 40.0.3. Based on the comments above, I do not think we can do much here. Or is there a speculative fix that might help? Thanks.
Flags: needinfo?(kinetik)
Unfortunately I don't have a speculative fix for this. The one I did have (bug 1103824) bounced due to a regression, and after that I discovered the Chrome guys had tried the same fix with no positive result. Having said that, 3000 crashes seems like a lot, is there a way we can work out the magnitude of severity compared to the same hang being seen in Chrome (which, if I understood their bug correctly, sounded less severe than what we're seeing)?
Flags: needinfo?(kinetik)
I can't go into exact numbers, but ~8% of our crashes on our dev channel come from this issue; it's definitely a top 5 crasher for us.
David, would you be able to address Matthew's question in comment 24? I am wondering if this crash will be fixed anytime soon for 41 or it is already too late and I can wontfix it for 41.
Flags: needinfo?(dmajor)
This signature is under 1% of crashes on 40.0.3 release. (Referring to raw crash counts misses out on some perspective) Plus my understanding of bug 1156172 is that only a fraction of the volume is due to the wasapi issue anyway. At this point I think we have bigger fish to fry. We might as well drop the tracking flags. Rest assured that the stability team will make noise and re-request tracking if this becomes a bigger problem.
Flags: needinfo?(dmajor)
Component: Audio/Video → Audio/Video: Playback
Same for 42.
not an MSE bug
No longer blocks: MSE
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()] [@ shutdownhang | ntdll.dll@0x3c6bc] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()] [@ shutdownhang | ntdll.dll@0x3c6bc] [@ shutdownhang |…
Priority: P1 → P2
I agree that it would be great to fix it but we have been wontfixing it for a long time. Not sure what would be the point of tracking it anymore...
I don't think there's anything you can do about this crash. At least we over at Chrome haven't found anything and Microsoft has explicitly said they don't think a client application can do anything to workaround this. It's due to misbehaving OS level drivers causing a deadlock. See comment#21 for more details.
Thanks Dale, sad state of affair, but at least now we know where we stand.
See Also: → 1267752
Mass change P2 -> P3
Priority: P2 → P3
See Also: → 1295663
Crash Signature: mozilla::MediaShutdownManager::Shutdown] → mozilla::MediaShutdownManager::Shutdown] [@ AsyncShutdownTimeout | profile-before-change | MediaShutdownManager: shutdown]
See Also: → 1308629
Bug 1308629 is the Mac version of this crash.
See Also: → 1328147
Status: ASSIGNED → NEW
Whiteboard: qa-not-actionable
Severity: critical → S2

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
Crash Signature: , bool) | mozilla::MediaShutdownManager::Shutdown()] [@ shutdownhang | ntdll.dll@0x3c6bc] [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent | NS_ProcessNextEvent | → , bool) | mozilla::MediaShutdownManager::Shutdown] [@ shutdownhang | ntdll.dll@0x3c6bc] [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent | NS_ProcessNextEvent |

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.