Closed Bug 1827021 Opened 2 years ago Closed 1 year ago

Crash in [@ mozilla::JsepTrackNegotiatedDetails::GetEncodingCount] on poison value

Categories

(Core :: WebRTC, defect)

Firefox 110
Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- wontfix
firefox114 --- fixed

People

(Reporter: mccr8, Assigned: bwc)

References

Details

(4 keywords)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/42828703-f8b6-4f3f-b24b-5b8bb0230403

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libxul.so  std::__ndk1::vector<mozilla::UniquePtr<mozilla::JsepTrackEncoding, mozilla::DefaultDelete<mozilla::JsepTrackEncoding> >, std::__ndk1::allocator<mozilla::UniquePtr<mozilla::JsepTrackEncoding, mozilla::DefaultDelete<mozilla::JsepTrackEncoding> > > >::size const  /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/vector:656
0  libxul.so  mozilla::JsepTrackNegotiatedDetails::GetEncodingCount const  dom/media/webrtc/jsep/JsepTrack.h:42
0  libxul.so  mozilla::dom::RTCRtpTransceiver::GetNegotiatedRecvCodecs const  dom/media/webrtc/jsapi/RTCRtpTransceiver.cpp:708
0  libxul.so  mozilla::PeerConnectionImpl::GetCodecStats  dom/media/webrtc/jsapi/PeerConnectionImpl.cpp:3157
1  libxul.so  mozilla::PeerConnectionImpl::GetStats  dom/media/webrtc/jsapi/PeerConnectionImpl.cpp:3268
2  libxul.so  mozilla::PeerConnectionCtx::EverySecondTelemetryCallback_m  dom/media/webrtc/jsapi/PeerConnectionCtx.cpp:396
3  libxul.so  mozilla::detail::VariantImplementation<unsigned char,   mfbt/Variant.h
3  libxul.so  mozilla::detail::VariantImplementation<unsigned char,   mfbt/Variant.h:318
3  libxul.so  mozilla::detail::VariantImplementation<unsigned char,   mfbt/Variant.h:318
3  libxul.so  mozilla::detail::VariantImplementation<unsigned char,   mfbt/Variant.h:318

Looking back at the last 6 months, I only see these crashes on 110 and 111, even when I scan the protosignature for JsepTrackNegotiatedDetails, so I don't know if it might be a regression. There are 26 crashes with this signature on poison values in the last 6 months.

Flags: needinfo?(docfaraday)

Byron will triage this once he's back.

It looks like this was happening every 2-3 days on average until 112 was released last week. The bug might already be gone, but I'll look at this again later to see.

The rollout of 112 was halted very quickly due to a cookie regression, and only unfrozen with the release of 112.0.1 today, so I think it is too early to say anything about what is happening on 112 in release, unfortunately.

So, looking at this, it is not at all clear to me how we're ending up there. I'm pretty sure that the only way to observe this is for there to be a RefPtr<JsepTransceiver> holding onto a freed pointer. While JsepTransceiver has a threadsafe refcount (that it does not need), I guess there's some chance that a RefPtr<JsepTransceiver> itself is being used on more than one thread, which could lead to a bad pointer inside. Giving JsepTransceiver a non-threadsafe refcount is probably warranted, especially if it will help us detect a rare misuse of the RefPtr.

https://treeherder.mozilla.org/jobs?repo=try&revision=b91022146b9ad833d5aedd3435242908f5614ce4
https://treeherder.mozilla.org/jobs?repo=try&revision=dfbe8876eb7e00313e581b4f94494dd960214a6d

We might actually be able to ditch the refcount on JsepTransceiver entirely, now that I think about it. I'll look closer at that tomorrow.

Flags: needinfo?(docfaraday)
Depends on: 1829667

I found another little cluster of Jsep Android crashes. There are 16 like this in the last month.
I searched for: Fenix, "proto signature" contains SyncToJsep, "address" contains e5e5.

bp-8e25cd70-a942-4385-9a7f-71f830230421
bp-3da18505-c4db-4478-914d-b32900230423
bp-51381306-0101-468d-912b-a88c60230424

The signature is useless ([@ arena_dalloc | Allocator<T>::free | free | delete ]) and then there's a zillion frames of library code, but then there's
JsepTrack::UpdateStreamIds()
RTCRtpSender::SyncToJsep()
RTCRtpTransceiver::SyncToJsep()
mozilla::PeerConnectionImpl::SyncToJsep()

The RTCRtpSender::SyncToJsep() line it is crashing on is aJsepTransceiver.mSendTrack.UpdateStreamIds(streamIds);.

No longer depends on: 1829667
Flags: needinfo?(docfaraday)
Assignee: nobody → docfaraday

As far as I can tell, this has never happened on 113.

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(docfaraday)
Resolution: --- → WORKSFORME
No longer blocks: webrtc-triage
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

There may be a fix or the signature morphs in 114.

I searched for Fenix crashes with major version > 113, and I didn't see any crashes with either SyncToJsep or JsepTrackNegotiatedDetails in the proto signature. (On 113, doing that I see 2 crashes on poison values.)

Low volume, so we'll wait for 114 to roll out before we close this.

Whiteboard: [adv-main114+r]

Because of the low volume (and perhaps the devices involved) we have never seen this crash on a nightly or beta version. The fact we haven't seen any 114 crashes means nothing yet.

Since we didn't check in any fixes and don't even have suspicions of a patch that might have fixed it, calling this "fixed" anywhere is the wrong thing to do for a security bug (it triggers advisories, people will ask for the patch.... a mess). We don't have "worksforme" as a release-specific status so the next best choice would be "unaffected". If, that is, we had enough evidence that the crash was gone, which we don't yet.

Keywords: regression
Whiteboard: [adv-main114+r]
Version: unspecified → Firefox 110

(In reply to Daniel Veditz [:dveditz] from comment #12)

Because of the low volume (and perhaps the devices involved) we have never seen this crash on a nightly or beta version. The fact we haven't seen any 114 crashes means nothing yet.

Since we didn't check in any fixes and don't even have suspicions of a patch that might have fixed it, calling this "fixed" anywhere is the wrong thing to do for a security bug (it triggers advisories, people will ask for the patch.... a mess). We don't have "worksforme" as a release-specific status so the next best choice would be "unaffected". If, that is, we had enough evidence that the crash was gone, which we don't yet.

We did do work here in a related bug (in the see also list). That's why we think this might be fixed.

The "see also" is bug 1827022, which was marked WORKSFORME after nothing landed. Should this bug have a "depends on" or "see also" for bug 1829667? You removed the depends on 2023-04-25 and it wasn't clear to me why. Thanks.

Flags: needinfo?(jmathies)
Depends on: 1829667
Flags: needinfo?(jmathies)
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Group: media-core-security → core-security-release
Target Milestone: --- → 114 Branch
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.