Crash in [@ mozilla::JsepTrackNegotiatedDetails::GetEncodingCount] on poison value
Categories
(Core :: WebRTC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox113 | --- | wontfix |
firefox114 | --- | fixed |
People
(Reporter: mccr8, Assigned: bwc)
References
Details
(4 keywords)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/42828703-f8b6-4f3f-b24b-5b8bb0230403
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so std::__ndk1::vector<mozilla::UniquePtr<mozilla::JsepTrackEncoding, mozilla::DefaultDelete<mozilla::JsepTrackEncoding> >, std::__ndk1::allocator<mozilla::UniquePtr<mozilla::JsepTrackEncoding, mozilla::DefaultDelete<mozilla::JsepTrackEncoding> > > >::size const /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/vector:656
0 libxul.so mozilla::JsepTrackNegotiatedDetails::GetEncodingCount const dom/media/webrtc/jsep/JsepTrack.h:42
0 libxul.so mozilla::dom::RTCRtpTransceiver::GetNegotiatedRecvCodecs const dom/media/webrtc/jsapi/RTCRtpTransceiver.cpp:708
0 libxul.so mozilla::PeerConnectionImpl::GetCodecStats dom/media/webrtc/jsapi/PeerConnectionImpl.cpp:3157
1 libxul.so mozilla::PeerConnectionImpl::GetStats dom/media/webrtc/jsapi/PeerConnectionImpl.cpp:3268
2 libxul.so mozilla::PeerConnectionCtx::EverySecondTelemetryCallback_m dom/media/webrtc/jsapi/PeerConnectionCtx.cpp:396
3 libxul.so mozilla::detail::VariantImplementation<unsigned char, mfbt/Variant.h
3 libxul.so mozilla::detail::VariantImplementation<unsigned char, mfbt/Variant.h:318
3 libxul.so mozilla::detail::VariantImplementation<unsigned char, mfbt/Variant.h:318
3 libxul.so mozilla::detail::VariantImplementation<unsigned char, mfbt/Variant.h:318
Looking back at the last 6 months, I only see these crashes on 110 and 111, even when I scan the protosignature for JsepTrackNegotiatedDetails, so I don't know if it might be a regression. There are 26 crashes with this signature on poison values in the last 6 months.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Byron will triage this once he's back.
Assignee | ||
Comment 2•2 years ago
|
||
It looks like this was happening every 2-3 days on average until 112 was released last week. The bug might already be gone, but I'll look at this again later to see.
Reporter | ||
Comment 3•2 years ago
|
||
The rollout of 112 was halted very quickly due to a cookie regression, and only unfrozen with the release of 112.0.1 today, so I think it is too early to say anything about what is happening on 112 in release, unfortunately.
Assignee | ||
Comment 4•2 years ago
|
||
So, looking at this, it is not at all clear to me how we're ending up there. I'm pretty sure that the only way to observe this is for there to be a RefPtr<JsepTransceiver> holding onto a freed pointer. While JsepTransceiver has a threadsafe refcount (that it does not need), I guess there's some chance that a RefPtr<JsepTransceiver> itself is being used on more than one thread, which could lead to a bad pointer inside. Giving JsepTransceiver a non-threadsafe refcount is probably warranted, especially if it will help us detect a rare misuse of the RefPtr.
https://treeherder.mozilla.org/jobs?repo=try&revision=b91022146b9ad833d5aedd3435242908f5614ce4
https://treeherder.mozilla.org/jobs?repo=try&revision=dfbe8876eb7e00313e581b4f94494dd960214a6d
Assignee | ||
Comment 5•2 years ago
|
||
We might actually be able to ditch the refcount on JsepTransceiver entirely, now that I think about it. I'll look closer at that tomorrow.
Reporter | ||
Comment 6•2 years ago
|
||
I found another little cluster of Jsep Android crashes. There are 16 like this in the last month.
I searched for: Fenix, "proto signature" contains SyncToJsep, "address" contains e5e5.
bp-8e25cd70-a942-4385-9a7f-71f830230421
bp-3da18505-c4db-4478-914d-b32900230423
bp-51381306-0101-468d-912b-a88c60230424
The signature is useless ([@ arena_dalloc | Allocator<T>::free | free | delete ]) and then there's a zillion frames of library code, but then there's
JsepTrack::UpdateStreamIds()
RTCRtpSender::SyncToJsep()
RTCRtpTransceiver::SyncToJsep()
mozilla::PeerConnectionImpl::SyncToJsep()
The RTCRtpSender::SyncToJsep() line it is crashing on is aJsepTransceiver.mSendTrack.UpdateStreamIds(streamIds);
.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
As far as I can tell, this has never happened on 113.
Updated•2 years ago
|
Assignee | ||
Comment 8•2 years ago
|
||
Finally saw this on 113:
https://crash-stats.mozilla.org/report/index/28259ba1-46c0-4d4d-84ab-5ce350230510
Comment 9•2 years ago
•
|
||
There may be a fix or the signature morphs in 114.
Reporter | ||
Comment 10•2 years ago
|
||
I searched for Fenix crashes with major version > 113, and I didn't see any crashes with either SyncToJsep or JsepTrackNegotiatedDetails in the proto signature. (On 113, doing that I see 2 crashes on poison values.)
Reporter | ||
Updated•2 years ago
|
Comment 11•2 years ago
|
||
Low volume, so we'll wait for 114 to roll out before we close this.
Updated•1 years ago
|
Updated•1 years ago
|
Comment 12•1 years ago
|
||
Because of the low volume (and perhaps the devices involved) we have never seen this crash on a nightly or beta version. The fact we haven't seen any 114 crashes means nothing yet.
Since we didn't check in any fixes and don't even have suspicions of a patch that might have fixed it, calling this "fixed" anywhere is the wrong thing to do for a security bug (it triggers advisories, people will ask for the patch.... a mess). We don't have "worksforme" as a release-specific status so the next best choice would be "unaffected". If, that is, we had enough evidence that the crash was gone, which we don't yet.
Comment 13•1 years ago
|
||
(In reply to Daniel Veditz [:dveditz] from comment #12)
Because of the low volume (and perhaps the devices involved) we have never seen this crash on a nightly or beta version. The fact we haven't seen any 114 crashes means nothing yet.
Since we didn't check in any fixes and don't even have suspicions of a patch that might have fixed it, calling this "fixed" anywhere is the wrong thing to do for a security bug (it triggers advisories, people will ask for the patch.... a mess). We don't have "worksforme" as a release-specific status so the next best choice would be "unaffected". If, that is, we had enough evidence that the crash was gone, which we don't yet.
We did do work here in a related bug (in the see also list). That's why we think this might be fixed.
Reporter | ||
Comment 14•1 years ago
|
||
The "see also" is bug 1827022, which was marked WORKSFORME after nothing landed. Should this bug have a "depends on" or "see also" for bug 1829667? You removed the depends on 2023-04-25 and it wasn't clear to me why. Thanks.
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 year ago
|
Description
•