Closed Bug 1452472 Opened 2 years ago Closed 2 years ago

Crash in InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::MediaEngineWebRTCMicrophoneSource::Stop

Categories

(Core :: WebRTC: Audio/Video, defect, P1, critical)

60 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 - fixed
firefox61 - fixed

People

(Reporter: philipp, Assigned: pehrsons)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-21c9e35e-b273-482c-9629-8a3720180407.
=============================================================

Top 10 frames of crashing thread:

0 mozglue.dll MOZ_CrashPrintf mfbt/Assertions.cpp:50
1 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:26
2 xul.dll nsTArray_Impl<nsGridContainerFrame::TrackSize, nsTArrayInfallibleAllocator>::operator[] xpcom/ds/nsTArray.h:1067
3 xul.dll mozilla::MediaEngineWebRTCMicrophoneSource::Stop dom/media/webrtc/MediaEngineWebRTCAudio.cpp:737
4 xul.dll mozilla::MediaDevice::Stop dom/media/MediaManager.cpp:1012
5 xul.dll <lambda_8b50338af7815249e386b19989dff0e5>::operator dom/media/MediaManager.cpp:4105
6 xul.dll mozilla::detail::RunnableFunction<<lambda_114078868a915abed36bc5ffa2449d96> >::Run xpcom/threads/nsThreadUtils.h:536
7 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:535
8 xul.dll mozilla::ipc::MessagePumpForNonMainUIThreads::DoRunLoop ipc/glue/MessagePump.cpp:403
9 xul.dll base::MessagePumpWin::RunWithDispatcher ipc/chromium/src/base/message_pump_win.cc:56

=============================================================

these content crashes are newly showing up during the 60.0b cycle - so far from windows only. probably it's related to the changes in bug 1299515.

the most common MOZ_CRASH Reason fieds in the reports are:
ElementAt(aIndex = 18446744073709551615, aLength = 0) 	18 	56.25 %
ElementAt(aIndex = 4294967295, aLength = 0)
(In reply to [:philipp] from comment #0)
> This bug was filed from the Socorro interface and is
> report bp-21c9e35e-b273-482c-9629-8a3720180407.
> =============================================================
> 
> Top 10 frames of crashing thread:
> 
> 0 mozglue.dll MOZ_CrashPrintf mfbt/Assertions.cpp:50
> 1 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:26
> 2 xul.dll nsTArray_Impl<nsGridContainerFrame::TrackSize,
> nsTArrayInfallibleAllocator>::operator[] xpcom/ds/nsTArray.h:1067
> 3 xul.dll mozilla::MediaEngineWebRTCMicrophoneSource::Stop
> dom/media/webrtc/MediaEngineWebRTCAudio.cpp:737
> 4 xul.dll mozilla::MediaDevice::Stop dom/media/MediaManager.cpp:1012
> 5 xul.dll <lambda_8b50338af7815249e386b19989dff0e5>::operator
> dom/media/MediaManager.cpp:4105
> 6 xul.dll
> mozilla::detail::RunnableFunction<<lambda_114078868a915abed36bc5ffa2449d96>
> >::Run xpcom/threads/nsThreadUtils.h:536
> 7 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:535
> 8 xul.dll mozilla::ipc::MessagePumpForNonMainUIThreads::DoRunLoop
> ipc/glue/MessagePump.cpp:403
> 9 xul.dll base::MessagePumpWin::RunWithDispatcher
> ipc/chromium/src/base/message_pump_win.cc:56
> 
> =============================================================
> 
> these content crashes are newly showing up during the 60.0b cycle - so far
> from windows only. probably it's related to the changes in bug 1299515.
> 
> the most common MOZ_CRASH Reason fieds in the reports are:
> ElementAt(aIndex = 18446744073709551615, aLength = 0) 	18 	56.25 %
This is an int64_t of value -1 (NoIndex)

> ElementAt(aIndex = 4294967295, aLength = 0)
This is an int32_t of value -1 (NoIndex)


Crash is at [1]. I assume a recent beta build turned off MOZ_DIAGNOSTIC_ASSERT for beta to be more like release?

I'm assigning myself to keep it on my radar.

[1] https://dxr.mozilla.org/mozilla-beta/rev/122c684d629a2d726a6a1f209c7b1e8966f46f25/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#737
Assignee: nobody → apehrson
See Also: → 1449841, 1443976, 1443774
Crash Signature: [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::MediaEngineWebRTCMicrophoneSource::Stop] → [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::MediaEngineWebRTCMicrophoneSource::Stop] [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | mozilla::MediaEngineWebRTCMicrophoneSource::Stop] [@ mozilla::MediaEngineWebRTCMicro…
(In reply to Andreas Pehrson [:pehrsons] from comment #1)
> I assume a recent beta build turned off
> MOZ_DIAGNOSTIC_ASSERT for beta to be more like release?

Oh, the diagnostic assert is for dev edition, not beta. My bad.
With us always crashing on index -1, the only explanation I see is that we're trying to Stop() from disabling on a source that had already been deallocated. There's an async step here where we have no main thread guard checking whether we had already stopped the source, which would be enough to cause this. [1]


[1] https://dxr.mozilla.org/mozilla-beta/rev/122c684d629a2d726a6a1f209c7b1e8966f46f25/dom/media/MediaManager.cpp#4034-4037,4105
Status: NEW → ASSIGNED
Rank: 9
Priority: -- → P1
Comment on attachment 8966173 [details]
Bug 1452472 - Guard against stopped source and removed listener after timer fired.

https://reviewboard.mozilla.org/r/234906/#review240576

::: dom/media/MediaManager.cpp:4183
(Diff revision 1)
> +      if (mRemoved) {
> +        // Listener was removed between timer resolving and this runnable.
> +        return DeviceOperationPromise::CreateAndResolve(NS_ERROR_ABORT, __func__);
> +      }
> +
> +      if (state.mStopped) {
> +        // Source was stopped between timer resolving and this runnable.
> +        return DeviceOperationPromise::CreateAndResolve(NS_ERROR_ABORT, __func__);
> +      }

I'd prefer combining these in one if, but either way. 

Maybe add a comment about shutdown here. This being a callback from a plain timer, we need to check wherther we're still relevant. A common pattern in MediaManager is:

    MediaManager* mgr = MediaManager::GetIfExists();
    if (!mgr) {
      return;
    }

But it looks redundant here provided mRemoved is guaranteed to be set in that case, is that true?

::: dom/media/MediaManager.cpp:4195
(Diff revision 1)
>        if (mWindowListener) {
>          mWindowListener->ChromeAffectingStateChanged();
>        }

This isn't going to mess up the in-chrome indicators is it? Have they already been called in the shutdown case?
Attachment #8966173 - Flags: review?(jib) → review+
Comment on attachment 8966173 [details]
Bug 1452472 - Guard against stopped source and removed listener after timer fired.

https://reviewboard.mozilla.org/r/234906/#review240576

> I'd prefer combining these in one if, but either way. 
> 
> Maybe add a comment about shutdown here. This being a callback from a plain timer, we need to check wherther we're still relevant. A common pattern in MediaManager is:
> 
>     MediaManager* mgr = MediaManager::GetIfExists();
>     if (!mgr) {
>       return;
>     }
> 
> But it looks redundant here provided mRemoved is guaranteed to be set in that case, is that true?

Depends on the shutdown you're referring to, but relevance is signaled through DeviceState::mStopped. If we have to rely on other state for relevance we have bugs.

If the shutdown is from XPCOM, Stop() will be called on main thread and state.mStopped set here, [1].
If you mean Shutdown() (in this case the call to release the sources for a certain window), Stop() will have been called on main thread and state.mStopped is set here, [2].
If you mean MSG shutdown (if it could happen without MediaManager shutdown), we also call Stop() on main thread before setting mRemoved, [3]->[4]. In light of this, checking mRemoved here doesn't seem needed. But I'm keeping it for consistency with other sites checking it, and perhaps I'm missing a case.

I don't want to rely on global state like MediaManager::GetIfExists(). Once we have MediaManager-per-document it won't be possible anyway.


[1] https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/dom/media/MediaManager.cpp#3475
[2] https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/dom/media/MediaManager.cpp#3291,3304
[3] https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/dom/media/MediaManager.cpp#4454
[4] https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/dom/media/MediaManager.cpp#4442

> This isn't going to mess up the in-chrome indicators is it? Have they already been called in the shutdown case?

No.

StopTrack() updates chrome indicators, which is guaranteed to be called per my other reply.

I spent a good while in the past ensuring that all shutdown paths share the same path of updating main thread state, including chrome indicators. If DeviceState::mStopped is set, chrome must have been notified.
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3b6ab919411b
Guard against stopped source and removed listener after timer fired. r=jib
I don't think we need to track this given the low crash frequency, but nice to see that the patch looks upliftable anyways.
https://hg.mozilla.org/mozilla-central/rev/3b6ab919411b
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Comment on attachment 8966173 [details]
Bug 1452472 - Guard against stopped source and removed listener after timer fired.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1299515
[User impact if declined]: (Fairly small) risk of crashes on shutdown or window-close/navigation.
[Is this code covered by automated tests?]: Yes, indirectly. Bug 1449841, bug 1443976 and bug 1443774 all show it.
[Has the fix been verified in Nightly?]: The failure rate is not high enough to say for sure yet, but bug 1449841 which has the highest intermittent rate hasn't shown any new  occurrences since this landed. 
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: It's adding two early exit guards for a rare case. The worst fallout from this is that this rare case (where we today crash) takes a different path and something else bad happens. I'm pretty certain it leads to a clean exit however, so consider this already rare case going bad low risk.
[String changes made/needed]: None
Attachment #8966173 - Flags: approval-mozilla-beta?
Comment on attachment 8966173 [details]
Bug 1452472 - Guard against stopped source and removed listener after timer fired.

Fix a new crash (and frequent orange) in Fx60. Approved for 60.0b12.
Attachment #8966173 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.