Open Bug 1427011 Opened 2 years ago Updated 4 months ago

Crash in CAudioSessionControl::QueueStreamSwitch

Categories

(Core :: Audio/Video: cubeb, defect, P2, critical)

57 Branch
All
Windows 7
defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 --- affected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fix-optional
firefox70 --- fix-optional

People

(Reporter: philipp, Assigned: kinetik)

References

Details

(Keywords: crash, leave-open, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-4d9f33f4-382d-4c63-aeda-2e1540171218.
=============================================================

Top 10 frames of crashing thread:

0 audioses.dll CAudioSessionControl::QueueStreamSwitch 
1 audioses.dll CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged 
2 mmdevapi.dll CDeviceEnumerator::OnDefaultDeviceChanged 
3 mmdevapi.dll _chkstk 
4 mmdevapi.dll ATL::CAtlArray<ATL::CComQIPtr<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a>, ATL::CComQIPtrElementTraits<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a> >::~CAtlArray<ATL::CComQIPtr<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a>, ATL::CComQIPtrElementTraits<IPnpNotificationClient, &__s_GUID const _GUID_11baae68_d8ee_45b4_b541_97d517a6f57a> > 
5 ntdll.dll RtlAllocateMemoryBlockLookaside 
6 ntdll.dll TppWorkCallbackPrologRelease 
7 ntdll.dll TppSimplepExecuteCallback 
8 ntdll.dll TpPostTask 
9 kernel32.dll BaseThreadInitThunk 

=============================================================

these crashes have been around for  while but apparently are increasing in volume in firefox 57. they're mostly hitting users on windows 7.

numerous comments point towards changing device configuration while firefox is running (un/plugging monitors & loudspeakers, removing laptop from a docking station and going into sleep mode) as a source of the crash.
(In reply to [:philipp] from comment #0)
> This bug was filed from the Socorro interface and is
> report bp-4d9f33f4-382d-4c63-aeda-2e1540171218.
> =============================================================
> 
> Top 10 frames of crashing thread:
> 
> 0 audioses.dll CAudioSessionControl::QueueStreamSwitch 
> 1 audioses.dll
> CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged 
> 2 mmdevapi.dll CDeviceEnumerator::OnDefaultDeviceChanged 
Munro^^
Component: Audio/Video: Playback → WebRTC: Audio/Video
Flags: needinfo?(mchiang)
MediaDevices.ondevicechange doesn't use IAudioSessionControl interface.
In Gecko, there is only one file AudioSession.cpp [1] using it.

[1] https://searchfox.org/mozilla-central/rev/7c67b86074cbc1b63c8beccfb06d2ab9295140b8/widget/windows/AudioSession.cpp
Flags: needinfo?(mchiang) → needinfo?(bwu)
It looks like related to widget. 
Let's change component.
Component: WebRTC: Audio/Video → Widget
Flags: needinfo?(bwu)
Yet another odd audio interface crash on Win7. David, any ideas?
Flags: needinfo?(davidp99)
Priority: -- → P2
Flags: needinfo?(davidp99)
Priority: P2 → P1
Assignee: nobody → davidp99
It looks like the IAudioSessionControl thats failing is a Windows object.  I believe at that point it is calling the IMMNotificationClients to alert them of the device change.  We see this crash in all three process types (nearly all are parent proc but 100 or so are content and a couple dozen are plugin).  So I think cubeb's is the likely culprit.  If so, I'm probably at the limit of what I can contribute here.

This is long-winded because its all reasoning.  There is no STR and I'm not 100% sure of everything.

---

This is the top of most of the stacks:

>  AudioSes.dll!CAudioSessionControl::QueueStreamSwitch(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *)  Unknown
>  AudioSes.dll!CAudioSessionControl::CDefaultDeviceChangeHandler::OnDefaultDeviceChanged(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *)  Unknown
>  MMDevAPI.dll!CDeviceEnumerator::OnDefaultDeviceChanged(enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0001,enum __MIDL___MIDL_itf_mmdeviceapi_0000_0000_0002,unsigned short const *)  Unknown

CDeviceEnumerator::OnDefaultDeviceChanged is iterating over the IMMNotificationClients, passing one through the child functions:

> 00007FF8A9F7E361 E8 32 04 00 00       call        ATL::CAtlList<IMMNotificationClient * __ptr64,ATL::CElementTraits<IMMNotificationClient * __ptr64> >::GetNext (07FF8A9F7E798h)  
> 00007FF8A9F7E366 48 8B 38             mov         rdi,qword ptr [rax]  
[...]
> 00007FF8A9F7E382 48 8B CF             mov         rcx,rdi  

and later in CAudioSessionControl::QueueStreamSwitch, we get this:

> 00007FF8A86CB1BC 4C 8B F1             mov         r14,rcx  
[...]
> 00007FF8A86CB1F4 41 39 BE 20 01 00 00 cmp         dword ptr [r14+120h],edi

The last line is the crash.  Only this one of the crashes [1] was really useful here.  They all have 0x0 or 0xffffffffffffffff as the failing address despite the fact that the math wouldn't produce that (apparently something they or we do with invalid pointer math that has a 1 in the high bit -- its 0x0 for Win7 and 0xff... for Win8+) except this one which had 0x14 in rdi and 0x134 as the crash address -- and 0x120 + 0x14 == 0x134.  I assume this is the memory where the Windows held the pointer being overwritten.

So its got an invalid IMMNotificationClient.  There are 3 places where we set them up.  They are:

1) The AudioNotificationSender [2].  This only exists in the parent process.
2) The one for PluginUtils, which also only exists in the parent process [3]
3) cubeb [4]

Looking at the cubeb code, its use of the IMMNotificationClient looks right but the behavior around the memory management is too deep for me to follow.  I will say that I don't think the issue is the wasapi_endpoint_notification_client's (ie the IMMNotificationClient's) ref count -- if anything, I'd guess its the cubeb_stream that owns it.  The cubeb_stream would have to be destroyed without calling wasapi_cubeb_stream_destroy and I don't see any obvious way for that to happen.  But its important to note that (Un)RegisterEndpointNotificationCallback don't affect the refcount so deleting them is easy.

(Seems highly unlikely but I think that someone _could_ be DLL-injecting and adding an IMMNotificationClient.)

NI to :kinetik for cubeb help.

-----

[1] https://crash-stats.mozilla.com/report/index/1b9673d9-29c9-4cad-a17b-f441e0180204#tab-rawdump
[2] https://searchfox.org/mozilla-central/source/dom/media/AudioNotificationSender.cpp#53
[3] https://searchfox.org/mozilla-central/source/dom/plugins/ipc/PluginUtilsWin.cpp#50
[4] https://searchfox.org/mozilla-central/source/media/libcubeb/src/cubeb_wasapi.cpp#298
Assignee: davidp99 → nobody
Flags: needinfo?(kinetik)
Component: Widget → Audio/Video: cubeb
Priority: P1 → --
Thanks for the analysis!

Looking at the diffs in cubeb_wasapi.cpp between 56, 57, and 58, there haven't been any major changes (aside the string interning) or changes that are obviously related to this area.

We used to leak the notification client if UnregisterEndpointNotificationCallback returned an error as a fix for bug 1263514, but we lost that workaround unintentionally when we moved from manually refcounting in cubeb_wasapi.cpp to using a smart pointer... the comment about leaking the notification client is still present, and we return early, but the smart pointer will still cause the notification client's refcount to drop when the dtor runs and possibly be freed (reading the docs for RegisterEndpointNotificationCallback and then bug 1263514 it's a bit confusing whether RegisterEndpointNotificationCallback is bumping our notification client's refcount).  Having said that, those changes landed in 53, so if that's the issue it's strange not to see this spiking until 57/58.

The other thing that sprang to mind is that we could be hitting the emergency_bailout code (where we hit a long timeout when it looks like the OS APIs have stopped responding and try to shut down as cleanly as possible without hanging), which may result in the notification client being unregistered (and destroyed) while parts of the OS side of the stream could return to life later.  But again, that landed in 53 (and was uplifted back to 51).

So nothing conclusive just yet.  I'll keep thinking about it.
Rank: 10
Priority: -- → P2
I've looked at many crashes, examined the code, and thought about this... but don't have a solution yet.

Since libcubeb looks like the obvious culprit here, my current plan is to disable the EndpointNotificationCallback code inside libcubeb when Gecko's AudioNotification is in use, since AudioNotification effectively duplicates libcubeb's code when it's active.  AudioNotification lives in ContentParent so it's only active in E10S mode; we could move the initialization elsewhere to cover non-E10S if this initial experiment has a positive effect.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Flags: needinfo?(kinetik)
Comment on attachment 8971750 [details]
Bug 1427011 - Disable default device switching in libcubeb's WASAPI backend.

https://reviewboard.mozilla.org/r/240518/#review246320

Looks great but you forgot to add `disable-device-switching.patch` to this patch.
Attachment #8971750 - Flags: review?(padenot) → review+
Pushed by mgregan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f38524bd2735
Disable default device switching in libcubeb's WASAPI backend.  r=padenot
(In reply to Paul Adenot (:padenot) from comment #10)
> Looks great but you forgot to add `disable-device-switching.patch` to this
> patch.

Whoops, thanks for pointing that out.  Landed with that added: https://hg.mozilla.org/integration/mozilla-inbound/rev/f38524bd2735d9885d9f5c69980f755f9816bed3

Marking this leave-open, since this change is speculative and I'll need to track the crash stats after this hits beta and has enough exposure to indicate any change in crash rates.
Keywords: leave-open
(In reply to Matthew Gregan [:kinetik] from comment #12)
> Marking this leave-open, since this change is speculative and I'll need to
> track the crash stats after this hits beta and has enough exposure to
> indicate any change in crash rates.

the volume of this signature has somewhat decreased during the 61.0b cycle, but it's not gone for good.
I'm not sure what these numbers tell us.  Crash count is down, which might suggest libcubeb was responsible for some of the crashes... on the other hand 20% of crashes are in content processes, which libcubeb is no longer registering an IMMNotificationClient in.  Current release shows 10% of crashes in content processes.  That seems to suggest something other than libcubeb is (also?) involved here.
Duplicate of this bug: 1508221

Updating affected branches since I see nightly crashes while doing triage. One comment stood out "unplugged usb headphones while watching video."

(In reply to Gabriele Svelto [:gsvelto] from comment #4) Bug 1508221

It might be worth moving the crash signature of this bug over to bug 1427011
to track it.

Crash Signature: [@ CAudioSessionControl::QueueStreamSwitch] → [@ CAudioSessionControl::QueueStreamSwitch] [@ _chkstk | CreateNotificationBlock ]
Duplicate of this bug: 1560735

A potential way to trigger this crash is mentioned in bug 1560735 comment 5 (and 0). I attempted to reproduce (but using the macOS RDP client - the client shouldn't matter) and couldn't trigger a crash or hang.

This happens when using RDP for Android
https://bugzilla.mozilla.org/show_bug.cgi?id=1560735

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

I couldn't repro locally. I see that the crash report linked from bug 1560735 has Nahimic (audio software) DLLs present, it's possible that's related.

Flags: needinfo?(kinetik)
You need to log in before you can comment on or make changes to this bug.