Closed Bug 1614321 Opened 9 months ago Closed 7 months ago

Crash in [@ IPCError-browser | ShutDownKill | NtAlpcSendWaitReceivePort]

Categories

(Core :: Widget: Win32, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- fixed
firefox76 --- fixed

People

(Reporter: pascalc, Unassigned)

Details

(Keywords: crash)

Crash Data

This bug is for crash report bp-b3fc3bd6-a805-4f68-817c-3f7500200209.

Top 10 frames of crashing thread:

0 ntdll.dll NtAlpcSendWaitReceivePort 
1 rpcrt4.dll long LRPC_BASE_CCALL::DoSendReceive 
2 audioses.dll AEWMILOG_DROP 
3 rpcrt4.dll void Ndr64UDTSimpleTypeMarshall1 
4 rpcrt4.dll void Ndr64SupplementMarshall 
5 rpcrt4.dll virtual long LRPC_CCALL::SendReceive 
6 rpcrt4.dll virtual void* LRPC_CCALL::`scalar deleting destructor' 
7 rpcrt4.dll I_RpcSendReceive 
8 rpcrt4.dll NdrSendReceive 
9 audioses.dll AEWMILOG_DROP 

Nightly-only hung content processes at shutdown. The vast majority of the stacks include this frame:

https://hg.mozilla.org/mozilla-central/annotate/cb56699431a0f051c40a8d0c765826e710de0aad/widget/windows/AudioSession.cpp#l303

The other crashes also seem to point to audio-related operations. Could they just be slow?

Component: General → Widget: Win32
Duplicate of this bug: 1614585
Crash Signature: [@ IPCError-browser | ShutDownKill | NtAlpcSendWaitReceivePort] → [@ IPCError-browser | ShutDownKill | NtAlpcSendWaitReceivePort] [@ IPCError-browser | ShutDownKill | RtlpWalkFrameChain | RtlWalkFrameChain | RtlCaptureStackBackTrace | CDeviceEnumerator::UnregisterEndpointNotificationCallback ]

I did another pass over the crash reports and established a couple of things:

  • The content processes aren't hung, they're just being slow. Most would shut down correctly if given enough time.
  • Destroying the IAudioSessionControl object is slow, so we try to do it on a background thread
  • However in these stacks we're actually releasing the object on the main thread, which means we failed spawning the other thread and got here

David, since you wrote this code, does my analysis seem correct to you? Do you have any ideas why thread creation might fail and we're stuck on the main thread?

Flags: needinfo?(davidp99)

:gsvelto, the behavior you are talking about was introduced in bug 1419488. It works the way you suggest, but it's limited to Windows 7 [1], which means it can't be responsible for most of what we are seeing here (nearly all crashes are Win 10).

From here is gets complicated. The crash with CDeviceEnumerator::UnregisterEndpointNotificationCallback -- the one that came in when bug 1614585 was duplicated to this one -- very much mirrors what we saw in bug 1419488. And that crash is currently 100% in Windows 10, going back 6 months. So there may be 2 things there : (1) its not actually a dupe of this and (2) we should extend the code in [1] to work in Windows 10, not just Windows 7, as the hang now shows up there too.

That would still leave the crash in comment 0. The crash is in system code and doesn't seem to be giving any really useful data. It happens in all versions of Windows. Crash-stats says that many of them are actually startup crashes (I don't know how much to believe this). The crash is a core system-RPC-related operation and it looks like some of them have some audio stuff on the stack but most don't. The ones that look audio-related seem to be in shutdown/hang behavior (again, I'm not certain of this). So I took a look at the Windows 7 crashes under that signature [2]. I've only checked a dozen or so but none were clearly audio related and most seemed very much not (most, but not all, seem to be in graphics). From all that, I'm thinking that extending the Win 7 audio fix above to the rest of Windows may also fix the audio-related crashes under this signature, but not fix the crash in its entirety since it probably has many causes.

I'm going to un-dupe bug 1614585 and extend the win 7 fix to the rest of Windows there. I'll leave this bug open because I don't know how best to deal with it.


[1] https://searchfox.org/mozilla-central/rev/c1e3d3edd4a9b784971555dc74a5de23d768b2e1/widget/windows/AudioSession.cpp#281
[2] https://crash-stats.mozilla.org/signature/?platform_pretty_version=~7&signature=IPCError-browser%20%7C%20ShutDownKill%20%7C%20NtAlpcSendWaitReceivePort&date=%3E%3D2019-08-21T18%3A42%3A00.000Z&date=%3C2020-02-21T18%3A42%3A00.000Z&_columns=date&_columns=version&_columns=build_id&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_columns=platform_pretty_version&_sort=-date&page=1

Flags: needinfo?(davidp99)

FYI: There are actually Windows 7 crashes with the CDeviceEnumerator::UnregisterEndpointNotificationCallback crash in that list -- because they are from builds that predate the fix from bug 1419488, which went in in version 62.

Thanks for the very detailed explanation David! I've opened a bunch of crash reports under the signature that was added in comment 0 and most of them have the IAudioSessionControl destruction on the stack... but not all of them. That's why I originally duped bug 1614585 against this one, they seemed the same but apparently there's at least two different stacks under this signature.

I will remove the second signature so we keep the bugs separate. My guess is that once you've landed the fix for bug 1614585 the crash volume here will go down and we'll be left with only the non-audio stacks.

Crash Signature: [@ IPCError-browser | ShutDownKill | NtAlpcSendWaitReceivePort] [@ IPCError-browser | ShutDownKill | RtlpWalkFrameChain | RtlWalkFrameChain | RtlCaptureStackBackTrace | CDeviceEnumerator::UnregisterEndpointNotificationCallback ] → [@ IPCError-browser | ShutDownKill | NtAlpcSendWaitReceivePort]

Bug 1617283 might be factoring into this.

Ah yes, that'd be nice! Seems like this issue is going away soon.

The priority flag is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Priority: -- → P3

After the fix for bug 1614585 landed the volume here dropped almost to zero with all the audio-related crashes going away. The few remaining reports have stack traces that are all over the place so I'm marking this fixed and adding the signature to the "slow" ones.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.