Crash in [@ nsIEventTarget::IsOnCurrentThread]
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
People
(Reporter: kbrosnan, Assigned: alwu)
References
(Regression)
Details
(4 keywords, Whiteboard: [adv-main124+r][adv-esr115.9+r])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
diannaS
:
approval-mozilla-esr115+
tjr
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/0a979a90-c801-4d1b-b8e6-d7faa0210106
Reason: SIGBUS / BUS_ADRALN
Top 10 frames of crashing thread:
0 libxul.so nsIEventTarget::IsOnCurrentThread xpcom/threads/nsThreadUtils.cpp:764
1 libxul.so mozilla::TransceiverImpl::UpdateDtlsTransportState dom/media/webrtc/jsapi/TransceiverImpl.cpp:115
2 libxul.so mozilla::runnable_args_memfn<mozilla::TransceiverImpl*, void dom/media/webrtc/transport/runnable_utils.h:121
3 libxul.so mozilla::detail::runnable_args_base< dom/media/webrtc/transport/runnable_utils.h:41
4 libxul.so mozilla::RunnableTask::Run xpcom/threads/TaskController.cpp:450
5 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:720
6 libxul.so mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:579
7 libxul.so mozilla::TaskController::ProcessPendingMTTask xpcom/threads/TaskController.cpp:373
8 libxul.so mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal xpcom/threads/nsThreadUtils.h:577
9 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1197
Crash address is poison values 0xe5e5e5e5e5e5e5ed and 0xe5e5e5e5e5e5ed are most of the crashes.
Comment 1•4 years ago
|
||
There were almost no crashes before a huge spike in September, which mostly got fixed. Maybe whatever fixed the spike in release (81?) missed a similar case that is hit less often and could be instructive?
Comment 2•4 years ago
|
||
Looks like this is fixed, could you dupe it to the right bug?
Updated•4 years ago
|
Comment 3•4 years ago
|
||
The September spike I assume was bug 1654399 comment 7, but it lists bug 1681025 as a sole regression from it, which has a different crash signature.
Mjf, is that the right one or was this fixed by something else?
Comment 4•4 years ago
|
||
The crashes that caused the spike in Sept were all seemingly related to calling RTCRtpReceiver::GetTransport() when the RTCRtpReceiver's mTransceiverImpl was null. The signature in the description here does seem different, although the graph looks somewhat better after the spike subsided than before the spike.
Since this signature is still seeing a small number of crash reports daily (all on Android, interestingly), I'm not sure I would call this fixed.
Comment 5•4 years ago
|
||
Seems mMainThread
here is free memory. Looks like a runnable ran out of runway. Could this one also be shutdown related?
Setting regression range based on age of code implicated.
Updated•4 years ago
|
Comment 6•4 years ago
|
||
Bug 1626278 might also be implicated. https://searchfox.org/mozilla-central/rev/bf03edd45ee0804f382c1f05ec88a05f5d88833c/dom/media/webrtc/jsapi/TransceiverImpl.cpp#101
Updated•4 years ago
|
Updated•4 years ago
|
Comment 7•4 years ago
|
||
Not a new issue, low crash volume, doesn't sound like I need to track this. Happy to uplift a fix if we have one soon though.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 8•4 years ago
|
||
No repro currently, stalled. A runnable on shutdown that fires too late.
Comment 9•4 years ago
|
||
Also, note, shutdown is being reworked currently through our current uplift work, so hopefully this will go away.
Updated•4 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Doesn't appear to have gone away with the uplift work.
Comment 12•3 years ago
|
||
So, looking at crash-stats, the most recent crash with poisoned memory was back in 80 (fenix). All the rest seem to be nullptr crashes. Additionally, none of these crashes involve TransceiverImpl::UpdateDtlsTransportState (or any other webrtc code, for that matter). The crashes on crash-stats right now seem to be happening in RemoteVideoDecoder::SetSeekThreshold.
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Does webrtc actually use SetSeekThreshold? Is this in the right component?
(In reply to Byron Campen [:bwc] from comment #13)
Does webrtc actually use SetSeekThreshold? Is this in the right component?
I don't know how much decoding machinery is used in WebRTC, so can't be sure, but based on that signatures is see for recent crashes it makes sense to me that the current reports are a different issue than in comment 0 and are likely playback specific.
Updated•2 years ago
|
Comment 15•1 year ago
|
||
We are going through stalled bugs trying to close ones that are not relevant anymore or still not actionable. Looking at this issue, it seems since its original triage it's changed - the most common signature has changed to be a null-pointer crash on Fenix where mThread is null when calling SetSeekThreshold. Could you take a second look at it, and see if you think it is still stalled? If not, it seems like this would no longer be a security bug also, if it is just a content process null deref crash.
Comment 16•1 year ago
|
||
I'm not sure how much I trust those stacks. Asking some questions about how much trust to put into the various stack info sources.
Comment 17•1 year ago
|
||
Ok, this looks like a decent stack:
https://crash-stats.mozilla.org/report/index/dab1f515-cbd2-4c66-8f75-491c30231213
I do see some other cases where this happens, but they're less common.
Comment 18•1 year ago
|
||
Looks like maybe we're using a non-Initted RemoveVideoDecoder?
Comment 19•10 months ago
|
||
(In reply to Byron Campen [:bwc] from comment #18)
Looks like maybe we're using a non-Initted RemoveVideoDecoder?
Yes. I would suspect this is MediaChangeMonitor
having detected some change so it recreates and inits its decoder. Then during that async init, it forwards a call to SetSeekThreshold.
I see three ways of fixing this:
- Make
RemoteVideoDecoder::mThread
const
, i.e. set it in the ctor - Handle the case where RemoteVideoDecoder::SetSeekThreshold runs and mThread is null
- Handle the case where mDecoder hasn't inited in MediaChangeMonitor::SetSeekThreshold by either,
- ignoring it, which it already seems to do when
mDecoder
is null, though it doesn't appear to ever get set to null - deferring until
mDecoder
has finished initing
- ignoring it, which it already seems to do when
I don't know the threading model well enough here to decide this. Maybe Alastor?
Assignee | ||
Comment 20•10 months ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #19)
Yes. I would suspect this is
MediaChangeMonitor
having detected some change so it recreates and inits its decoder. Then during that async init, it forwards a call to SetSeekThreshold.
This looks like a right assumption.
- deferring until
mDecoder
has finished initing
This sounds like a best solution to me.
Assignee | ||
Comment 21•10 months ago
|
||
Updated•10 months ago
|
Assignee | ||
Comment 22•10 months ago
|
||
Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Hard, it's a racing between setting an attribute on a decoder and an initialization for a decoder, which is hard to reproduce.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: All
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: This patch should be able to backport to all branches, if not, then it's also easy to create a branch-specific patch.
- How likely is this patch to cause regressions; how much testing does it need?: Low. The attribute causing crash is just used for some optimization when outputting video samples after seeking. It shouldn't affect the video playback behavior so also less possible to cause any serious regressions.
- Is the patch ready to land after security approval is given?: Yes
- Is Android affected?: Unknown
Comment 23•9 months ago
|
||
Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.
Approved to land and uplift
Assignee | ||
Comment 24•9 months ago
|
||
Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.
Beta/Release Uplift Approval Request
- User impact if declined: Crash
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce new behavior and structural change, it's just about delay an operation to prevent us from accessing a non-initialized decoder.
- String changes made/needed:
- Is Android affected?: Unknown
Comment 25•9 months ago
|
||
Comment 26•9 months ago
|
||
Updated•9 months ago
|
Updated•9 months ago
|
Comment 27•9 months ago
|
||
Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.
Approved for 124.0b4
Comment 28•9 months ago
|
||
uplift |
Updated•9 months ago
|
Comment 29•9 months ago
|
||
Comment on attachment 9379327 [details]
Bug 1685358 - Apply seek threshold after decoder creation.
Approved for 115.9esr
Comment 30•9 months ago
|
||
uplift |
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•2 months ago
|
Description
•