Open Bug 1427011 Opened 2 years ago Updated 4 months ago
Crash in CAudio
Session Control::Queue Stream Switch
59 bytes, text/x-review-board-request
This bug was filed from the Socorro interface and is report bp-4d9f33f4-382d-4c63-aeda-2e1540171218. ============================================================= Top 10 frames of crashing thread: 0 audioses.dll CAudioSessionControl::QueueStreamSwitch 1 audioses.dll CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged 2 mmdevapi.dll CDeviceEnumerator::OnDefaultDeviceChanged 3 mmdevapi.dll _chkstk 4 mmdevapi.dll ATL::CAtlArray<ATL::CComQIPtr<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a>, ATL::CComQIPtrElementTraits<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a> >::~CAtlArray<ATL::CComQIPtr<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a>, ATL::CComQIPtrElementTraits<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a> > 5 ntdll.dll RtlAllocateMemoryBlockLookaside 6 ntdll.dll TppWorkCallbackPrologRelease 7 ntdll.dll TppSimplepExecuteCallback 8 ntdll.dll TpPostTask 9 kernel32.dll BaseThreadInitThunk ============================================================= these crashes have been around for while but apparently are increasing in volume in firefox 57. they're mostly hitting users on windows 7. numerous comments point towards changing device configuration while firefox is running (un/plugging monitors & loudspeakers, removing laptop from a docking station and going into sleep mode) as a source of the crash.
(In reply to [:philipp] from comment #0) > This bug was filed from the Socorro interface and is > report bp-4d9f33f4-382d-4c63-aeda-2e1540171218. > ============================================================= > > Top 10 frames of crashing thread: > > 0 audioses.dll CAudioSessionControl::QueueStreamSwitch > 1 audioses.dll > CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged > 2 mmdevapi.dll CDeviceEnumerator::OnDefaultDeviceChanged Munro^^
Component: Audio/Video: Playback → WebRTC: Audio/Video
MediaDevices.ondevicechange doesn't use IAudioSessionControl interface. In Gecko, there is only one file AudioSession.cpp  using it.  https://searchfox.org/mozilla-central/rev/7c67b86074cbc1b63c8beccfb06d2ab9295140b8/widget/windows/AudioSession.cpp
Flags: needinfo?(mchiang) → needinfo?(bwu)
It looks like related to widget. Let's change component.
Component: WebRTC: Audio/Video → Widget
Yet another odd audio interface crash on Win7. David, any ideas?
It looks like the IAudioSessionControl thats failing is a Windows object. I believe at that point it is calling the IMMNotificationClients to alert them of the device change. We see this crash in all three process types (nearly all are parent proc but 100 or so are content and a couple dozen are plugin). So I think cubeb's is the likely culprit. If so, I'm probably at the limit of what I can contribute here. This is long-winded because its all reasoning. There is no STR and I'm not 100% sure of everything. --- This is the top of most of the stacks: > AudioSes.dll!CAudioSessionControl::QueueStreamSwitch(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *) Unknown > AudioSes.dll!CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *) Unknown > MMDevAPI.dll!CDeviceEnumerator::OnDefaultDeviceChanged(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *) Unknown CDeviceEnumerator::OnDefaultDeviceChanged is iterating over the IMMNotificationClients, passing one through the child functions: > 00007FF8A9F7E361 E8 32 04 00 00 call ATL::CAtlList<IMMNotificationClient * __ptr64,ATL::CElementTraits<IMMNotificationClient * __ptr64> >::GetNext (07FF8A9F7E798h) > 00007FF8A9F7E366 48 8B 38 mov rdi,qword ptr [rax] [...] > 00007FF8A9F7E382 48 8B CF mov rcx,rdi and later in CAudioSessionControl::QueueStreamSwitch, we get this: > 00007FF8A86CB1BC 4C 8B F1 mov r14,rcx [...] > 00007FF8A86CB1F4 41 39 BE 20 01 00 00 cmp dword ptr [r14+120h],edi The last line is the crash. Only this one of the crashes  was really useful here. They all have 0x0 or 0xffffffffffffffff as the failing address despite the fact that the math wouldn't produce that (apparently something they or we do with invalid pointer math that has a 1 in the high bit -- its 0x0 for Win7 and 0xff... for Win8+) except this one which had 0x14 in rdi and 0x134 as the crash address -- and 0x120 + 0x14 == 0x134. I assume this is the memory where the Windows held the pointer being overwritten. So its got an invalid IMMNotificationClient. There are 3 places where we set them up. They are: 1) The AudioNotificationSender . This only exists in the parent process. 2) The one for PluginUtils, which also only exists in the parent process  3) cubeb  Looking at the cubeb code, its use of the IMMNotificationClient looks right but the behavior around the memory management is too deep for me to follow. I will say that I don't think the issue is the wasapi_endpoint_notification_client's (ie the IMMNotificationClient's) ref count -- if anything, I'd guess its the cubeb_stream that owns it. The cubeb_stream would have to be destroyed without calling wasapi_cubeb_stream_destroy and I don't see any obvious way for that to happen. But its important to note that (Un)RegisterEndpointNotificationCallback don't affect the refcount so deleting them is easy. (Seems highly unlikely but I think that someone _could_ be DLL-injecting and adding an IMMNotificationClient.) NI to :kinetik for cubeb help. -----  https://crash-stats.mozilla.com/report/index/1b9673d9-29c9-4cad-a17b-f441e0180204#tab-rawdump  https://searchfox.org/mozilla-central/source/dom/media/AudioNotificationSender.cpp#53  https://searchfox.org/mozilla-central/source/dom/plugins/ipc/PluginUtilsWin.cpp#50  https://searchfox.org/mozilla-central/source/media/libcubeb/src/cubeb_wasapi.cpp#298
Assignee: davidp99 → nobody
Thanks for the analysis! Looking at the diffs in cubeb_wasapi.cpp between 56, 57, and 58, there haven't been any major changes (aside the string interning) or changes that are obviously related to this area. We used to leak the notification client if UnregisterEndpointNotificationCallback returned an error as a fix for bug 1263514, but we lost that workaround unintentionally when we moved from manually refcounting in cubeb_wasapi.cpp to using a smart pointer... the comment about leaking the notification client is still present, and we return early, but the smart pointer will still cause the notification client's refcount to drop when the dtor runs and possibly be freed (reading the docs for RegisterEndpointNotificationCallback and then bug 1263514 it's a bit confusing whether RegisterEndpointNotificationCallback is bumping our notification client's refcount). Having said that, those changes landed in 53, so if that's the issue it's strange not to see this spiking until 57/58. The other thing that sprang to mind is that we could be hitting the emergency_bailout code (where we hit a long timeout when it looks like the OS APIs have stopped responding and try to shut down as cleanly as possible without hanging), which may result in the notification client being unregistered (and destroyed) while parts of the OS side of the stream could return to life later. But again, that landed in 53 (and was uplifted back to 51). So nothing conclusive just yet. I'll keep thinking about it.
I've looked at many crashes, examined the code, and thought about this... but don't have a solution yet. Since libcubeb looks like the obvious culprit here, my current plan is to disable the EndpointNotificationCallback code inside libcubeb when Gecko's AudioNotification is in use, since AudioNotification effectively duplicates libcubeb's code when it's active. AudioNotification lives in ContentParent so it's only active in E10S mode; we could move the initialization elsewhere to cover non-E10S if this initial experiment has a positive effect.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Comment on attachment 8971750 [details] Bug 1427011 - Disable default device switching in libcubeb's WASAPI backend. https://reviewboard.mozilla.org/r/240518/#review246320 Looks great but you forgot to add `disable-device-switching.patch` to this patch.
Attachment #8971750 - Flags: review?(padenot) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f38524bd2735 Disable default device switching in libcubeb's WASAPI backend. r=padenot
(In reply to Paul Adenot (:padenot) from comment #10) > Looks great but you forgot to add `disable-device-switching.patch` to this > patch. Whoops, thanks for pointing that out. Landed with that added: https://hg.mozilla.org/integration/mozilla-inbound/rev/f38524bd2735d9885d9f5c69980f755f9816bed3 Marking this leave-open, since this change is speculative and I'll need to track the crash stats after this hits beta and has enough exposure to indicate any change in crash rates.
(In reply to Matthew Gregan [:kinetik] from comment #12) > Marking this leave-open, since this change is speculative and I'll need to > track the crash stats after this hits beta and has enough exposure to > indicate any change in crash rates. the volume of this signature has somewhat decreased during the 61.0b cycle, but it's not gone for good.
I'm not sure what these numbers tell us. Crash count is down, which might suggest libcubeb was responsible for some of the crashes... on the other hand 20% of crashes are in content processes, which libcubeb is no longer registering an IMMNotificationClient in. Current release shows 10% of crashes in content processes. That seems to suggest something other than libcubeb is (also?) involved here.
Crash Signature: [@ CAudioSessionControl::QueueStreamSwitch] → [@ CAudioSessionControl::QueueStreamSwitch] [@ _chkstk | CreateNotificationBlock ]
You need to log in before you can comment on or make changes to this bug.