Closed Bug 1685358 Opened 4 years ago Closed 9 months ago

Crash in [@ nsIEventTarget::IsOnCurrentThread]

Categories

(Core :: Audio/Video: Playback, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr115 124+ fixed
firefox85 --- wontfix
firefox86 --- wontfix
firefox87 - wontfix
firefox88 --- wontfix
firefox123 --- wontfix
firefox124 + fixed
firefox125 + fixed

People

(Reporter: kbrosnan, Assigned: alwu)

References

(Regression)

Details

(4 keywords, Whiteboard: [adv-main124+r][adv-esr115.9+r])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/0a979a90-c801-4d1b-b8e6-d7faa0210106

Reason: SIGBUS / BUS_ADRALN

Top 10 frames of crashing thread:

0 libxul.so nsIEventTarget::IsOnCurrentThread xpcom/threads/nsThreadUtils.cpp:764
1 libxul.so mozilla::TransceiverImpl::UpdateDtlsTransportState dom/media/webrtc/jsapi/TransceiverImpl.cpp:115
2 libxul.so mozilla::runnable_args_memfn<mozilla::TransceiverImpl*, void  dom/media/webrtc/transport/runnable_utils.h:121
3 libxul.so mozilla::detail::runnable_args_base< dom/media/webrtc/transport/runnable_utils.h:41
4 libxul.so mozilla::RunnableTask::Run xpcom/threads/TaskController.cpp:450
5 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:720
6 libxul.so mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:579
7 libxul.so mozilla::TaskController::ProcessPendingMTTask xpcom/threads/TaskController.cpp:373
8 libxul.so mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal xpcom/threads/nsThreadUtils.h:577
9 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1197

Crash address is poison values 0xe5e5e5e5e5e5e5ed and 0xe5e5e5e5e5e5ed are most of the crashes.

There were almost no crashes before a huge spike in September, which mostly got fixed. Maybe whatever fixed the spike in release (81?) missed a similar case that is hit less often and could be instructive?

Looks like this is fixed, could you dupe it to the right bug?

Group: mobile-core-security → core-security
Component: General → WebRTC
Product: GeckoView → Core
Group: core-security → media-core-security

The September spike I assume was bug 1654399 comment 7, but it lists bug 1681025 as a sole regression from it, which has a different crash signature.

Mjf, is that the right one or was this fixed by something else?

Flags: needinfo?(mfroman)

The crashes that caused the spike in Sept were all seemingly related to calling RTCRtpReceiver::GetTransport() when the RTCRtpReceiver's mTransceiverImpl was null. The signature in the description here does seem different, although the graph looks somewhat better after the spike subsided than before the spike.

Since this signature is still seeing a small number of crash reports daily (all on Android, interestingly), I'm not sure I would call this fixed.

Flags: needinfo?(mfroman)

Seems mMainThread here is free memory. Looks like a runnable ran out of runway. Could this one also be shutdown related?

Setting regression range based on age of code implicated.

Has Regression Range: --- → yes

Not a new issue, low crash volume, doesn't sound like I need to track this. Happy to uplift a fix if we have one soon though.

No repro currently, stalled. A runnable on shutdown that fires too late.

Keywords: stalled

Also, note, shutdown is being reworked currently through our current uplift work, so hopefully this will go away.

Severity: -- → S2
Priority: -- → P2
No longer blocks: webrtc-triage

Doesn't appear to have gone away with the uplift work.

Hey John, any ideas on the cause of this?

Flags: needinfo?(jolin)

So, looking at crash-stats, the most recent crash with poisoned memory was back in 80 (fenix). All the rest seem to be nullptr crashes. Additionally, none of these crashes involve TransceiverImpl::UpdateDtlsTransportState (or any other webrtc code, for that matter). The crashes on crash-stats right now seem to be happening in RemoteVideoDecoder::SetSeekThreshold.

No longer blocks: webrtc-triage

Does webrtc actually use SetSeekThreshold? Is this in the right component?

Flags: needinfo?(bvandyk)

(In reply to Byron Campen [:bwc] from comment #13)

Does webrtc actually use SetSeekThreshold? Is this in the right component?

I don't know how much decoding machinery is used in WebRTC, so can't be sure, but based on that signatures is see for recent crashes it makes sense to me that the current reports are a different issue than in comment 0 and are likely playback specific.

Flags: needinfo?(bvandyk)
Component: WebRTC → Audio/Video: Playback

We are going through stalled bugs trying to close ones that are not relevant anymore or still not actionable. Looking at this issue, it seems since its original triage it's changed - the most common signature has changed to be a null-pointer crash on Fenix where mThread is null when calling SetSeekThreshold. Could you take a second look at it, and see if you think it is still stalled? If not, it seems like this would no longer be a security bug also, if it is just a content process null deref crash.

Flags: needinfo?(docfaraday)

I'm not sure how much I trust those stacks. Asking some questions about how much trust to put into the various stack info sources.

Flags: needinfo?(docfaraday)

Ok, this looks like a decent stack:

https://crash-stats.mozilla.org/report/index/dab1f515-cbd2-4c66-8f75-491c30231213

I do see some other cases where this happens, but they're less common.

Looks like maybe we're using a non-Initted RemoveVideoDecoder?

(In reply to Byron Campen [:bwc] from comment #18)

Looks like maybe we're using a non-Initted RemoveVideoDecoder?

Yes. I would suspect this is MediaChangeMonitor having detected some change so it recreates and inits its decoder. Then during that async init, it forwards a call to SetSeekThreshold.

I see three ways of fixing this:

I don't know the threading model well enough here to decide this. Maybe Alastor?

Flags: needinfo?(alwu)

(In reply to Andreas Pehrson [:pehrsons] from comment #19)

Yes. I would suspect this is MediaChangeMonitor having detected some change so it recreates and inits its decoder. Then during that async init, it forwards a call to SetSeekThreshold.

This looks like a right assumption.

  • deferring until mDecoder has finished initing

This sounds like a best solution to me.

Assignee: nobody → alwu
Flags: needinfo?(jolin)
Flags: needinfo?(alwu)
Status: NEW → ASSIGNED
Keywords: stalled
Attachment #9379327 - Attachment description: Bug 1685358 - some fixes for MediaChangeMonitor. → Bug 1685358 - Apply seek threshold after decoder creation.

Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Hard, it's a racing between setting an attribute on a decoder and an initialization for a decoder, which is hard to reproduce.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: All
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: This patch should be able to backport to all branches, if not, then it's also easy to create a branch-specific patch.
  • How likely is this patch to cause regressions; how much testing does it need?: Low. The attribute causing crash is just used for some optimization when outputting video samples after seeking. It shouldn't affect the video playback behavior so also less possible to cause any serious regressions.
  • Is the patch ready to land after security approval is given?: Yes
  • Is Android affected?: Unknown
Attachment #9379327 - Flags: sec-approval?

Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.

Approved to land and uplift

Attachment #9379327 - Flags: sec-approval? → sec-approval+

Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.

Beta/Release Uplift Approval Request

  • User impact if declined: Crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce new behavior and structural change, it's just about delay an operation to prevent us from accessing a non-initialized decoder.
  • String changes made/needed:
  • Is Android affected?: Unknown
Attachment #9379327 - Flags: approval-mozilla-beta?
Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5901552401da Apply seek threshold after decoder creation. r=media-playback-reviewers,padenot
Group: media-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
Attachment #9379327 - Flags: approval-mozilla-esr115?

Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.

Approved for 124.0b4

Attachment #9379327 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.

Approved for 115.9esr

Attachment #9379327 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
QA Whiteboard: [post-critsmash-triage]
QA Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main124+r][adv-esr124+r]
QA Whiteboard: [post-critsmash-triage][adv-main124+r][adv-esr124+r] → [post-critsmash-triage]
Whiteboard: [adv-main124+r][adv-esr124+r]
Whiteboard: [adv-main124+r][adv-esr124+r] → [adv-main124+r][adv-esr115.9+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: