Closed Bug 1724101 Opened 3 years ago Closed 3 years ago

ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] with nsHtml5StreamParser

Categories

(Core :: DOM: HTML Parser, defect)

defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox-esr78 92+ fixed
firefox-esr91 92+ fixed
firefox91 --- wontfix
firefox92 + fixed
firefox93 + fixed

People

(Reporter: tsmith, Assigned: hsivonen)

References

(Blocks 2 open bugs)

Details

(Keywords: csectype-race, csectype-uaf, sec-moderate, Whiteboard: [adv-main92+r][adv-esr78.14+r][adv-esr91.1+r])

Attachments

(2 files)

The attached crash information was detected by ThreadSanitizer while fuzzing build mozilla-central 20210716-160071ad9de0. I will attach the test case once it is reduced.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

This looks like a potential use-after-free to me.

Keywords: csectype-uaf
Summary: ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] → ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] with nsHtml5StreamParser
Flags: needinfo?(hsivonen)

The stacks here appears to explain bug 1642314.

Bug 1642314 comment 1 explains the (evidently incorrect) reasoning why the code is the way it is in the first place.

Bug 1642086 is the opposite of this one. Bug 1642086 comment 3 suggests a bit silly-looking but correct fix for that bug. Extending that idea to this bug would involve adding yet another runnable round trip, but that seems even sillier.

Perhaps this pointer needs to become atomic.

Leaving myself needinfoed so I don't forget about this.

Making the pointer ReleaseAcquire atomic should work. nsHtml5StreamListener itself is thread-safely refcounted, so it itself stays alive and reachable by the Necko thread. If the delegate is atomically exchanged to nullptr, if the null checks are changed to atomic load, it should be true that if the delegate has been dropped on the main thread to the point that the stream parser's destructor has already run (on the main thread) and the delegate is, therefore, no longer usable, the load should see a nullptr. AFAICT, ReleaseAcquire would guarantee a relationship between the non-atomic destructor code and the atomic write. I.e. if the atomic write of nullptr should be guaranteed to be observable on other threads before it's possible to reach destructed stream parser from the Necko thread.

It's somewhat unfortunate that this means that every Necko event makes the core running the Necko thread sync cache with the core that runs the main thread, but I expect there to be enough actual locks for this to be negligible compared to whatever else is going on anyway.

Hmm. That's not quite right, because that doesn't ensure that whatever the Necko thread does with the obtained pointer happens before the main-thread destructor of the pointee.

The relationship between the parser thread and the main thread is managed in such a way, that we always make the parser thread no longer operate on a given stream parser before the stream parser goes away, but I guess we don't truly have a guarantee that Necko stops delivering events to the listener before the listener's delegate has been destructed.

It looks a lot like we need a real mutex here. :-(

Flags: needinfo?(hsivonen)
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

Landed:
https://hg.mozilla.org/integration/autoland/rev/f29e3aea6a16b57c02a4ca2d84ce19039144782c

Backed out for multiple failures in Mutex_posix.cpp:
https://hg.mozilla.org/integration/autoland/rev/bcf78d41751a094c06a70ed9e903be200063fdb9

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=f29e3aea6a16b57c02a4ca2d84ce19039144782c
Failure log example: https://treeherder.mozilla.org/logviewer?job_id=348249763&repo=autoland

[Parent 1540, Main Thread] WARNING: NS_ENSURE_TRUE(rootFrame) failed: file /builds/worker/checkouts/gecko/dom/base/nsGlobalWindowOuter.cpp:4232
###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
 calling context
  [stack trace unavailable]
=== Cycle completed at
--- Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
 calling context
  [stack trace unavailable]
###!!! Deadlock may happen NOW!
[Parent 1540, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
Cyclical dependency starts at
Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
Cycle completed at
Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
###!!! Deadlock may happen NOW!
: 'Error', file /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:245
stack-fixing for the first stack frame, this may take a while...
#01: NS_DebugBreak [xpcom/base/nsDebugImpl.cpp:431]
#02: mozilla::BlockingResourceBase::CheckAcquire() [xpcom/threads/BlockingResourceBase.cpp:247]
#03: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:311]
#04: nsHtml5StreamListener::GetDelegate() [parser/html/nsHtml5StreamListener.cpp:51]
#05: nsHtml5Parser::GetChannel(nsIChannel**) [parser/html/nsHtml5Parser.cpp:108]
#06: nsHtml5Parser::StartTokenizer(bool) [parser/html/nsHtml5Parser.cpp:673]
#07: nsHtml5StreamParser::OnStartRequest(nsIRequest*) [parser/html/nsHtml5StreamParser.cpp:1234]
#08: nsHtml5StreamListener::OnStartRequest(nsIRequest*) [parser/html/nsHtml5StreamListener.cpp:68]
#09: nsDocumentOpenInfo::OnStartRequest(nsIRequest*) [uriloader/base/nsURILoader.cpp:166]
#10: mozilla::net::DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*) [netwerk/ipc/DocumentLoadListener.cpp:1296]
#11: mozilla::net::DocumentLoadListener::FinishReplacementChannelSetup(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1206]
#12: mozilla::net::DocumentLoadListener::RedirectToRealChannelFinished(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1147]
#13: mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ThenValue<mozilla::net::DocumentLoadListener::TriggerRedirectToRealChannel(mozilla::Maybe<mozilla::dom::ContentParent*> const&)::$_24, mozilla::net::DocumentLoadListener::TriggerRedirectToRealChannel(mozilla::Maybe<mozilla::dom::ContentParent*> const&)::$_25>::DoResolveOrRejectInternal(mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ResolveOrRejectValue&) [xpcom/threads/MozPromise.h:846]
#14: mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ThenValueBase::ResolveOrRejectRunnable::Run() [xpcom/threads/MozPromise.h:488]
#15: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:503]
#16: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:805]
#17: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:641]
#18: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:425]
#19: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:533]
#20: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1152]
#21: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:466]
#22: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:85]
#23: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:331]
#24: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
#25: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
#26: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:275]
#27: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:5287]
#28: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5479]
#29: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5538]
#30: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x41985]
#31: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x21b97]
#32: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x41569]
#33: ??? (???:???)
Flags: needinfo?(hsivonen)
Attachment #9235978 - Attachment description: Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a mutex. → Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a monitor.
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch

Please nominate this for Beta & ESR91 approval when you get a chance. It also grafts pretty cleanly to ESR78 (just a minor moz.build conflict), but I'm not sure if it's worth given how close to EOL it is. Feel free to nominate if you think we should take it, though.

Flags: needinfo?(hsivonen)

Comment on attachment 9235978 [details]
Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a monitor.

Beta/Release Uplift Approval Request

  • User impact if declined: A use-after-free seems possible under an unlikely thread scheduling combination: If the Necko thread in unscheduled between a null check and object use such that the main thread manages to complete an event loop spin in between.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: Can't provide steps to repro, since the bad scenario depends on very unlikely thread scheduling combination.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch is low-risk, because it adds a thread synchronization mechanism without changing other code flow. The synchronization primitive should be unable to deadlock with other synchronization mechanism.
  • String changes made/needed: None

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Theoretically possible—even if in practice very unlikely—use-after-free.
  • User impact if declined: A use-after-free seems possible under an unlikely thread scheduling combination: If the Necko thread in unscheduled between a null check and object use such that the main thread manages to complete an event loop spin in between.
  • Fix Landed on Version: 93
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch is low-risk, because it adds a thread synchronization mechanism without changing other code flow. The synchronization primitive should be unable to deadlock with other synchronization mechanism.
  • String or UUID changes made by this patch: None
Flags: needinfo?(hsivonen)
Attachment #9235978 - Flags: approval-mozilla-esr91?
Attachment #9235978 - Flags: approval-mozilla-esr78?
Attachment #9235978 - Flags: approval-mozilla-beta?

Comment on attachment 9235978 [details]
Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a monitor.

Approved for 92.0b4, 91.1esr, and 78.14esr.

Attachment #9235978 - Flags: approval-mozilla-esr91?
Attachment #9235978 - Flags: approval-mozilla-esr91+
Attachment #9235978 - Flags: approval-mozilla-esr78?
Attachment #9235978 - Flags: approval-mozilla-esr78+
Attachment #9235978 - Flags: approval-mozilla-beta?
Attachment #9235978 - Flags: approval-mozilla-beta+
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Whiteboard: [adv-main92+r]
Whiteboard: [adv-main92+r] → [adv-main92+r][adv-esr78.14+r]
Whiteboard: [adv-main92+r][adv-esr78.14+r] → [adv-main92+r][adv-esr78.14+r][adv-esr91.0.1+r]
Whiteboard: [adv-main92+r][adv-esr78.14+r][adv-esr91.0.1+r] → [adv-main92+r][adv-esr78.14+r][adv-esr91.1+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: