Closed Bug 1521304 Opened 1 year ago Closed 1 year ago

Assertion failure: IsAcquired() && mOwningThread == PR_GetCurrentThread(), at xpcom/threads/BlockingResourceBase.cpp:369

Categories

(Core :: WebRTC: Networking, defect, P1)

65 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox-esr60 66+ verified
firefox64 --- wontfix
firefox65 + wontfix
firefox66 + verified
firefox67 + verified

People

(Reporter: bc, Assigned: jesup)

References

()

Details

(Keywords: assertion, csectype-race, sec-high, Whiteboard: [post-critsmash-triage][adv-main66+][adv-esr60.6+])

Attachments

(2 files)

  1. https://www.infowars.com/breaking-news/

Linux x86_64 Nightly. wait a few seconds.
Linux x86_64 Beta. wait a minute or two.

May require the live show to be playing in the page. It just stopped being reproducible when the "Watch Live now" started throwing HLS.js error networkError. Appears to be using https://videojs.com/. I wasn't able to reproduce with any examples at videojs.com. I'm going to file any way in the event the error is temporary and we will be able to reproduce later.

Opt doesn't appear to crash.

  1. Assertion failure: IsAcquired() && mOwningThread == PR_GetCurrentThread(), at xpcom/threads/BlockingResourceBase.cpp:369
    #01: mozilla::DataChannelConnection::ReceiveCallback(socket*, void*, unsigned long, sctp_rcvinfo, int) (netwerk/sctp/datachannel/DataChannel.cpp:2317)
    #02: mozilla::receive_cb(socket*, sctp_sockstore, void*, unsigned long, sctp_rcvinfo, int, void*) (netwerk/sctp/datachannel/DataChannel.cpp:241)
    #03: sctp_invoke_recv_callback (netwerk/sctp/src/netinet/sctputil.c:4859)
    #04: sctp_add_to_readq (netwerk/sctp/src/netinet/sctputil.c:4968)
    #05: sctp_notify_send_failed (netwerk/sctp/src/netinet/sctputil.c:3150)
    #06: sctp_ulp_notify (:?)
    #07: sctp_release_pr_sctp_chunk (netwerk/sctp/src/netinet/sctputil.c:5119)
    #08: sctp_t3rxt_timer (:?)
    #09: sctp_timeout_handler (netwerk/sctp/src/netinet/sctputil.c:1763)
    #10: sctp_handle_tick (netwerk/sctp/src/netinet/sctp_callout.c:167)
    #11: start_thread (pthread_create.c:?)
    #12: __GI___clone (:?)
    #13: ??? (???:???)

jesup: I made this security senstive because of bug 1493347. Could you take a look and see if this is really security sensitive?

Previously this url had thrown:

Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Should always be able to find the appropriate sample description! Malformed mp4?), at /builds/worker/workspace/build/src/dom/media/mp4/Index.cpp:260

Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Should always be able to find the appropriate sample description! Malformed mp4?), at z:/build/build/src/dom/media/mp4/Index.cpp:260
#01: mozilla::MP4TrackDemuxer::GetNextSample() [dom/media/mp4/MP4Demuxer.cpp:419]
#02: mozilla::MP4TrackDemuxer::GetSamples(int) [dom/media/mp4/MP4Demuxer.cpp:480]
#03: mozilla::TrackBuffersManager::DoDemuxVideo() [dom/media/mediasource/TrackBuffersManager.cpp:1396]
#04: mozilla::TrackBuffersManager::CodedFrameProcessing() [dom/media/mediasource/TrackBuffersManager.cpp:1366]
#05: mozilla::TrackBuffersManager::SegmentParserLoop() [dom/media/mediasource/TrackBuffersManager.cpp:809]
#06: nsresult mozilla::detail::RunnableMethodImpl<mozilla::TrackBuffersManager ,void (mozilla::TrackBuffersManager::)(),1,mozilla::RunnableKind::Standard>::Run() [xpcom/threads/nsThreadUtils.h:1161]
#07: mozilla::TaskQueue::Runner::Run() [xpcom/threads/TaskQueue.cpp:200]
#08: nsThreadPool::Run() [xpcom/threads/nsThreadPool.cpp:243]
#09: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1154]
#10: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:468]
#11: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:304]
#12: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:309]
#13: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
#14: nsThread::ThreadFunc(void *) [xpcom/threads/nsThread.cpp:452]
#15: PR_NativeRunThread [nsprpub/pr/src/threads/combined/pruthr.c:406]
#16: static unsigned int pr_root(void *) [nsprpub/pr/src/md/windows/w95thred.c:138]
#17: ucrtbase.dll + 0x1c4ce
#18: KERNEL32.DLL + 0x13dc4
#19: static void patched_BaseThreadInitThunk(int, void *, void *) [mozglue/build/WindowsDllBlocklist.cpp:724]
#20: ntdll.dll + 0x73691

On Windows and Linux.

Flags: needinfo?
Flags: needinfo? → needinfo?(rjesup)
Group: core-security → media-core-security
Flags: needinfo?(rjesup)

I presume from comment 0 that you need to run a debug build to see the crash; I tried an opt build (with audio off!!) and didn't have problems (plus bob indicated it may not repro anymore; it may have been caused by an ad).

It never crashed opt only the fatal assertion on debug. I was able to reproduce it today. It requires the "Watch Live now" video to be playing and visible before the assertion will fire.

I caught this in debug build by letting it run long enough.

124 Thread 0x7fff925fd700 (LWP 25991) "SCTP timer" 0x00007fffe0b3a558 in mozilla::OffTheBooksMutex::AssertCurrentThreadOwns (
this=0x7fff81b2b468) at /home/jesup/src/mozilla/inbound/xpcom/threads/BlockingResourceBase.cpp:369

It's asserting we don't own the mutex when invoked on the timer thread (connection timeout). Normally all callbacks with data occur on the thread we provided input on, where we lock the mutex before passing it in.

We can just lock on the output (since all input is on one thread), or do a TryLock() on callbacks (to catch the case of being called from the timer thread). Locking in the callbacks seems to make more sense, unless there are other requirements about usrsctp_* functions being called from one thread (or without overlapping with other calls). I'll try that first; if it doesn't work the TryLock() should

Our tests don't catch this since it depends on a connection timing out.

I suspect this is also the cause of the other long-standing SCTP sec bug

Stable on infowars; also tried our webrtc-landing datachannel tests.  Would be good to test other datachannel applications *on debug builds*
Attachment #9038020 - Flags: review?(lennart.grahl)
Attachment #9038020 - Flags: review?(dminor)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED

I don't believe I have a decent enough overview regarding the threads used in that code to review. But anything you want me to test?

Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

I'm not really familiar enough with this code as well, I was hoping Lennart would be able to do the review!

Nils, do you mind having a look? If you're not confident either, please feel free to flag me for review again and I'll do my best with this.
Attachment #9038020 - Flags: review?(dminor) → review?(drno)
Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

Moving review to tuexen

Testing would be great!
Attachment #9038020 - Flags: review?(lennart.grahl) → review?(tuexen)

Note: this will be a prime candidate for any 65 point release. If we had 1 more week, I'd push to get it into 65.0 release

Attachment #9038020 - Flags: review?(tuexen) → review+

I "tested" it for two days with a constant data channel connection using Threema Web... if that helps. Nothing to report so far. :)

sec-high - missing lock means that the association failure could get processed on a different thread without locks, and so you can get nasty collisions on the shared structures, including UAFs.

This may be related to bug 1400563, which also involves the SCTP timer thread

Keywords: sec-moderatesec-high
Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

Review of attachment 9038020 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #9038020 - Flags: review?(drno) → review+

Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

Security Approval Request

How easily could an exploit be constructed based on the patch?

Not especially easy, but I wouldn't guarantee it's very hard. Knowing there's a missing mutex on that gives a big hint, and likely they could find the timer callback possibility eventually, and then they have to figure out what that message causes to happen under lock that could mess up the environment. However, this may be the cause of the long-standing clear UAF that involves the SCTP timer, so maybe.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Yes

Which older supported branches are affected by this flaw?

all

If not all supported branches, which bug introduced the flaw?

None

Do you have backports for the affected branches?

No

If not, how different, hard to create, and risky will they be?

Trivial to create

How likely is this patch to cause regressions; how much testing does it need?

Regressions are unlikely, and would be deadlocks. Alternative patch would be to lock the connection in the callback (from the timer) if not already locked (i.e. on the timer thread). Much messier and puts a bigger bullseye on it

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

None

User impact if declined

security risk

Is this code covered by automated tests?

No

Has the fix been verified in Nightly?

No

Needs manual test from QE?

No

If yes, steps to reproduce

Browse to the URL and let it sit, in a Debug build. I manually verified the fix locally after reproing the problem.

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Just moves a lock

String changes made/needed

none

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration

Security bug

User impact if declined

Security risk

Fix Landed on Version

TBD

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Just moves a lock

String or UUID changes made by this patch

none

Attachment #9038020 - Flags: sec-approval?
Attachment #9038020 - Flags: approval-mozilla-release?
Attachment #9038020 - Flags: approval-mozilla-esr60?
Attachment #9038020 - Flags: approval-mozilla-beta?

I'm not convinced we need to take this for a 65 dot release, but I'll track it accordingly for now. Note that we'd also need to dot release ESR 60.5 as well if we decided to go that route.

P1 based on sec-high rating.

Priority: -- → P1
Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

Sec-approval for checkin on February 12, two weeks into the new cycle.
Attachment #9038020 - Flags: sec-approval? → sec-approval+
Whiteboard: [checkin on 2/12]

We're going to let this ride with Fx66/60.6esr.

Attachment #9038020 - Flags: approval-mozilla-release? → approval-mozilla-release-

ping on the approvals for beta and esr

Flags: needinfo?(ryanvm)
Flags: needinfo?(lhenry)

We'll get the ESR one when we're triaging the requests for 60.6 soon.

Flags: needinfo?(ryanvm)
Whiteboard: [checkin on 2/12]

Thanks - I'm waiting for this to land on trunk before uplift. Should make it just fine into the beta 8 build on Thursday.

Flags: needinfo?(lhenry)
Group: media-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
Target Milestone: mozilla67 → ---
This is much safer - it retains all existing lock patterns, and adds a lock if we get a callback on any thread other than the STS thread (which is what calls conninput on received packets).  Basically this should catch any calls coming from an SCTP timer event.  For checkin, I'll delete the comments since they draw extra attention to why this matters.  I got about 1 in 50 failures running the basicDataOnly test locally; with the update I've run several thousand under ASAN with no problems.  I'll also try letting it run on the infowars site, but I can't guarantee it'll hit the same race we did before in debug non-ASAN.
Attachment #9043508 - Flags: review?(drno)
Comment on attachment 9043508 [details] [diff] [review]
Move some Mutexes

Review of attachment 9043508 [details] [diff] [review]:
-----------------------------------------------------------------

This is a lot easier to understand than the previous patch.
A better name for the variable would help though.

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +2314,5 @@
>  
>    if (!data) {
>      LOG(("ReceiveCallback: SCTP has finished shutting down"));
>    } else {
> +    bool not_sts = false;

I think it would make more sense if the variable name refers to holding the lock locally, e.g. holding_lock or release_lock_locally.
Attachment #9043508 - Flags: review?(drno) → review+
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

OK for beta uplift, should land for beta 8.
Attachment #9038020 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][qa-triaged]

Hello,

I have reproduced this issue with the asan debug build 65.0b12(buildID: 20190117232427).

Confirming this issue as verified fixed on 66.0b10(buildID:20190221023827) and 67.0a1(buildID:20190221102221). Verified on Windows 10x64 and Ubuntu 16.04x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Comment on attachment 9038020 [details] [diff] [review]
Move some Mutexes

Fixes a sec bug, verified by QA. Approved for 60.6esr.
Attachment #9038020 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
QA Whiteboard: [qa-triaged]
Whiteboard: [post-critsmash-triage][qa-triaged] → [post-critsmash-triage]

Hello,

Confirming this issue as verified fixed on ESR 60.6.0-build1. Verified on Windows 10x64 and Ubuntu 16.04.

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main66+][adv-esr60.6+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.