Closed
Bug 1135562
Opened 10 years ago
Closed 1 year ago
shutdownhang in MediaShutdownManager::Shutdown() with threads in MediaDecoderStateMachine::StopAudioThread() and wasapi_stream_init()
Categories
(Core :: Audio/Video: Playback, defect, P3)
Tracking
()
People
(Reporter: karlt, Assigned: kinetik)
References
Details
(Keywords: crash, hang, Whiteboard: qa-not-actionable)
Crash Data
+++ This bug was initially created as a clone of Bug #1129455 +++
https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219
Roughly half the shutdown hangs in MediaShutdownManager::Shutdown() in 36.0b10
(20150217132925) and in beta channel build id 20150219174507 have one thread
in MediaDecoderStateMachine::StopAudioThread() and one or more threads in
wasapi_stream_init().
As an extreme example,
https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219
has 10 threads in wasapi_stream_init().
Some links from bug 1129455
Beta:
https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=beta&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id
Aurora:
https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=aurora&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id
Overall stats
https://crash-stats.mozilla.com/report/list?signature=shutdownhang%20%7C%20WaitForSingleObjectEx%20%7C%20WaitForSingleObject%20%7C%20PR_Wait%20%7C%20nsThread%3A%3AProcessNextEvent%28bool%2C%20bool%2A%29%20%7C%20NS_ProcessNextEvent%28nsIThread%2A%2C%20bool%29%20%7C%20mozilla%3A%3AMediaShutdownManager%3A%3AShutdown%28%29+++ This bug was initially created as a clone of Bug #1129455 +++
Reporter | ||
Comment 1•10 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1129455#c6 implies that this hang is not as frequent as earlier betas, but that may be because YouTube is now using Flash instead of HTML5 video.
Blocks: MSE
Assignee | ||
Comment 2•10 years ago
|
||
These changes were backed out of beta and aurora, per bug 1133190 comment 40 and comment 42.
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Matthew Gregan [:kinetik] from comment #2)
> These changes were backed out of beta and aurora, per bug 1133190 comment 40
> and comment 42.
Thanks. The shutdown hang in https://crash-stats.mozilla.com/report/index/f52b81d5-6ef4-44ea-96de-149272150219 for example was in build (36.0b10) that includes the backout of 82339d98aa6a.
Reporter | ||
Comment 4•10 years ago
|
||
Hi Paul. Do you have any ideas what might cause threads to wait in AlpcSendWaitReceivePort()?
Are any of the changes that have landed on Nightly likely to help here?
Flags: needinfo?(padenot)
Assignee | ||
Comment 5•10 years ago
|
||
Sorry, so it does; I didn't pay close enough attention to the backout vs build dates.
Looking at the near-identical stacks of the ten threads inside wasapi_stream_init makes me wonder about bug 1103824 (despite the crash report originating from a Win 7 machine and that note being about Win 8).
Comment 6•10 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #4)
> Hi Paul. Do you have any ideas what might cause threads to wait in
> AlpcSendWaitReceivePort()?
Not really. I just tried a bunch of stuff:
- Reading the disassembled code check if `IAudioClient::Initialize` calls `AudioServerInitialize` only the first time or each time: it's called every time, so this might not be the STA vs. MTA first stream creation issue problem (but what do I know?)
- Trying to enable/disable audio devices while playing. Two behaviours, here:
- if `cubeb_init` has not been called, it goes in `cubeb_init` and no audio device is available so `wasapi_init` fails, cubeb tries to use winmm, which does not check if there is an audio device, and it appears to deadlock between the winmm thread, some thread trying to shut the stream down, and the MediaShutdownManager on the main thread (minidump: http://paul.cx/public/winmm-shutdown.dmp, it's a bit big to attach here. This is for revision 368c62292249).
- if cubeb_init was already called, and it chose WASAPI, then future playbacks just fail.
- Killing audiodg.exe while a stream is running, it works fine as well, sound stops, everything cleans up fine after CC
- Killing audiodg.exe while in the middle of setting up the stream (single-stepping in assembly mode), right at the stack trace Karl mentions
I have to say I'm quite confused about this one. I tried searching the web for similar problems and stacks, and I see gecko crash reports with kind of the same stacks, from before I implemented the WASAPI backend (I think WinMM is implemented with WASAPI things underneath these days, which would explain that), and this [0], which seems to be the same issue (albeit for a capture device, not a playback device).
[0]: https://social.msdn.microsoft.com/Forums/windowsapps/en-US/05f1693d-4174-4d1e-b6f9-4acaac3cc05d/wasapi-iaudioclientinitialize-hanging-when-opening-audio-capture-device?forum=tailoringappsfordevices
> Are any of the changes that have landed on Nightly likely to help here?
I don't think so, those patches are not really changing the way we open streams.
Flags: needinfo?(padenot)
Updated•10 years ago
|
status-firefox37:
--- → affected
status-firefox38:
--- → affected
status-firefox39:
--- → affected
tracking-firefox37:
--- → +
tracking-firefox38:
--- → +
tracking-firefox39:
--- → +
Comment 7•10 years ago
|
||
Sheila, do you think this is still a p1 blocker for MSE? If so, can you help us find an owner to fix this by Thursday when we go to build for beta 7? Thanks!
Flags: needinfo?(smooney)
Comment 8•10 years ago
|
||
I reviewed this with the media team today. While we'd like a fix and will continue to pursue one, this is not a blocker for 37. I'm marking 37 as wontfix.
Comment 9•10 years ago
|
||
These signatures don't look like a top crash for 37 - there are a few hundred for the last 7 days.
Comment 10•10 years ago
|
||
Anthony, do you think we should expect for a fix for this bug in 38? Thanks
Flags: needinfo?(ajones)
Comment 11•10 years ago
|
||
The patch in Bug 1155432 may help with some of the hangs, but not with the ones related to problems with the audio playback stalling.
Assignee | ||
Comment 12•10 years ago
|
||
I filed Socorro bug 1156172 to try to get some better data on the severity and possible correlations for this particular hang.
Comment 13•10 years ago
|
||
Too late for 38. Any news for 39?
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Assignee | ||
Comment 14•10 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #6)
> - Killing audiodg.exe while in the middle of setting up the stream
> (single-stepping in assembly mode), right at the stack trace Karl mentions
It's possible that the IAudioStreamVolume leak that was fixed and then refixed (bug 1134977) was causing audiodg.exe to exhaust address space and crash, exacerbating this issue. That was fixed back at the start of April, so the hang stats should reflect an improvement after that if it's the issue.
Without much else to go on, I'm preparing a patch that follows the COM rules mentioned in the IAudioClient documentation on MSDN. It may or may not help with this hang, but at the least removing it as a possible contributor is worthwhile.
Assignee | ||
Comment 15•10 years ago
|
||
I'm fixing the COM issue in the bug specific to that (bug 1103824) since I am no more certain than "hopeful" that it'll help with these hangs.
Depends on: 1103824
Reporter | ||
Comment 16•10 years ago
|
||
This stack is a bit different, with many threads in MediaDecoderStateMachine::StopAudioThread() but only one visibly in wasapi_stream_destroy().
https://crash-stats.mozilla.com/report/index/436c2009-fc1d-4ea5-b231-7d7f62150516#allthreads
Comment 17•10 years ago
|
||
Matthew, is this crash still happening and do you think it's affecting 40 and 41?
Which signature or stack represents this issue now ? I'm asking because I'm tracking the bug for 39, and not sure where you might want to uplift a fix, or how we can verify that it's fixed.
Flags: needinfo?(kinetik)
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to Liz Henry (:lizzard) from comment #17)
> Matthew, is this crash still happening and do you think it's affecting 40
> and 41?
Yes, and it will be. The patch in bug 1103824 bounced and I haven't solved the remaining issue yet.
> Which signature or stack represents this issue now ? I'm asking because I'm
> tracking the bug for 39, and not sure where you might want to uplift a fix,
> or how we can verify that it's fixed.
Since we don't have a way to reproduce or even an idea what the real cause is, the only indicator that there has been a fix or improvement will be a reduction in hangs in wasapi_stream_init, but unfortunately it doesn't show up as that in the crash/hang reporter, but as the generic hang under MediaShutdownManager::Shutdown.
Flags: needinfo?(kinetik)
Updated•10 years ago
|
status-firefox40:
--- → affected
status-firefox41:
--- → affected
tracking-firefox40:
--- → +
tracking-firefox41:
--- → +
Comment 19•10 years ago
|
||
OK, marking wontfix for 39. We will cross our fingers hopefully as you all continue to unknot the other shutdown hangs. Thanks for the extra details Matthew!
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 20•10 years ago
|
||
The summary of the bug in the Chromium bug tracker I just added to "see also" is that they're hitting the same IAudioClient::Initialize hang in some cases, took a few shots at fixing it (including a similar COM initialization change to bug 1103824) and haven't had a great deal of luck finding a solution yet either.
Comment 21•10 years ago
|
||
Dale from Chromium here, found your bug via search. Yeah, we have no ideas about this :( Matthew van Eerde from Microsoft ( http://blogs.msdn.com/b/matthew_van_eerde/ ) has requested the following:
"If you hit a hang in audioses.dll calling into rpcrt4.dll, can you take a dump of the svchost.exe process containing the AudioSrv service? I'm happy to investigate from there."
https://code.google.com/p/chromium/issues/detail?id=422522#c150
I'll update here if we find anything else out.
Comment 22•10 years ago
|
||
This bug is now wontfix for 40.
Matthew, this crash signature is showing up with 40.0.2 (~3000 crashes) and 40.0.3. Based on the comments above, I do not think we can do much here. Or is there a speculative fix that might help? Thanks.
Flags: needinfo?(kinetik)
Assignee | ||
Comment 24•9 years ago
|
||
Unfortunately I don't have a speculative fix for this. The one I did have (bug 1103824) bounced due to a regression, and after that I discovered the Chrome guys had tried the same fix with no positive result.
Having said that, 3000 crashes seems like a lot, is there a way we can work out the magnitude of severity compared to the same hang being seen in Chrome (which, if I understood their bug correctly, sounded less severe than what we're seeing)?
Flags: needinfo?(kinetik)
Comment 25•9 years ago
|
||
I can't go into exact numbers, but ~8% of our crashes on our dev channel come from this issue; it's definitely a top 5 crasher for us.
David, would you be able to address Matthew's question in comment 24? I am wondering if this crash will be fixed anytime soon for 41 or it is already too late and I can wontfix it for 41.
Flags: needinfo?(dmajor)
![]() |
||
Comment 27•9 years ago
|
||
This signature is under 1% of crashes on 40.0.3 release. (Referring to raw crash counts misses out on some perspective)
Plus my understanding of bug 1156172 is that only a fraction of the volume is due to the wasapi issue anyway.
At this point I think we have bigger fish to fry. We might as well drop the tracking flags. Rest assured that the stability team will make noise and re-request tracking if this becomes a bigger problem.
Flags: needinfo?(dmajor)
wontfix for 41 based on comment 27, which makes sense.
Updated•9 years ago
|
Component: Audio/Video → Audio/Video: Playback
Assignee | ||
Comment 29•9 years ago
|
||
Same for 42.
Updated•9 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc]
[@ shutdownhang |…
Updated•9 years ago
|
Priority: P1 → P2
[Tracking Requested - why for this release]:
This hang is 1.39% of signatures on 44.0, and our sixth largest signature.
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=shutdownhang+%7C+WaitForSingleObjectEx+%7C+WaitForSingleObject+%7C+PR_WaitCondVar+%7C+nsThread%3A%3AProcessNextEvent+%7C+NS_ProcessNextEvent+%7C+mozilla%3A%3ASharedThreadPool%3A%3ASpinUntilEmpty&version=44.0&date=%3C2016-02-08T18%3A29%3A58&date=%3E%3D2016-02-01T18%3A29%3A58
tracking-firefox45:
--- → ?
Comment 32•9 years ago
|
||
I agree that it would be great to fix it but we have been wontfixing it for a long time.
Not sure what would be the point of tracking it anymore...
Comment 33•9 years ago
|
||
I don't think there's anything you can do about this crash. At least we over at Chrome haven't found anything and Microsoft has explicitly said they don't think a client application can do anything to workaround this. It's due to misbehaving OS level drivers causing a deadlock. See comment#21 for more details.
Comment 34•9 years ago
|
||
Thanks Dale, sad state of affair, but at least now we know where we stand.
Mass change P2 -> P3
Priority: P2 → P3
Updated•8 years ago
|
Crash Signature: mozilla::MediaShutdownManager::Shutdown] → mozilla::MediaShutdownManager::Shutdown]
[@ AsyncShutdownTimeout | profile-before-change | MediaShutdownManager: shutdown]
Comment 37•8 years ago
|
||
Bug 1308629 is the Mac version of this crash.
Assignee | ||
Updated•7 years ago
|
Status: ASSIGNED → NEW
Updated•4 years ago
|
Whiteboard: qa-not-actionable
Updated•2 years ago
|
Severity: critical → S2
Comment 38•2 years ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Severity: S2 → S3
Updated•2 years ago
|
Crash Signature: , bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent | NS_ProcessNextEvent | → , bool) | mozilla::MediaShutdownManager::Shutdown]
[@ shutdownhang | ntdll.dll@0x3c6bc]
[@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent | NS_ProcessNextEvent |
Comment 39•1 year ago
|
||
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•