ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] with nsHtml5StreamParser
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
People
(Reporter: tsmith, Assigned: hsivonen)
References
(Blocks 2 open bugs)
Details
(Keywords: csectype-race, csectype-uaf, sec-moderate, Whiteboard: [adv-main92+r][adv-esr78.14+r][adv-esr91.1+r])
Attachments
(2 files)
23.48 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr78+
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
The attached crash information was detected by ThreadSanitizer while fuzzing build mozilla-central 20210716-160071ad9de0. I will attach the test case once it is reduced.
General information about TSan reports
Why fix races?
Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.
Rating
If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.
False Positives / Benign Races
Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].
[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Suppressing unfixable races
If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.
Comment 1•3 years ago
|
||
This looks like a potential use-after-free to me.
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
The stacks here appears to explain bug 1642314.
Bug 1642314 comment 1 explains the (evidently incorrect) reasoning why the code is the way it is in the first place.
Bug 1642086 is the opposite of this one. Bug 1642086 comment 3 suggests a bit silly-looking but correct fix for that bug. Extending that idea to this bug would involve adding yet another runnable round trip, but that seems even sillier.
Perhaps this pointer needs to become atomic.
Leaving myself needinfoed so I don't forget about this.
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
•
|
||
Making the pointer ReleaseAcquire
atomic should work. nsHtml5StreamListener
itself is thread-safely refcounted, so it itself stays alive and reachable by the Necko thread. If the delegate is atomically exchanged to nullptr
, if the null checks are changed to atomic load, it should be true that if the delegate has been dropped on the main thread to the point that the stream parser's destructor has already run (on the main thread) and the delegate is, therefore, no longer usable, the load should see a nullptr
. AFAICT, ReleaseAcquire
would guarantee a relationship between the non-atomic destructor code and the atomic write. I.e. if the atomic write of nullptr
should be guaranteed to be observable on other threads before it's possible to reach destructed stream parser from the Necko thread.
It's somewhat unfortunate that this means that every Necko event makes the core running the Necko thread sync cache with the core that runs the main thread, but I expect there to be enough actual locks for this to be negligible compared to whatever else is going on anyway.
Assignee | ||
Comment 4•3 years ago
|
||
Hmm. That's not quite right, because that doesn't ensure that whatever the Necko thread does with the obtained pointer happens before the main-thread destructor of the pointee.
The relationship between the parser thread and the main thread is managed in such a way, that we always make the parser thread no longer operate on a given stream parser before the stream parser goes away, but I guess we don't truly have a guarantee that Necko stops delivering events to the listener before the listener's delegate has been destructed.
It looks a lot like we need a real mutex here. :-(
Assignee | ||
Comment 5•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 6•3 years ago
|
||
Landed:
https://hg.mozilla.org/integration/autoland/rev/f29e3aea6a16b57c02a4ca2d84ce19039144782c
Backed out for multiple failures in Mutex_posix.cpp:
https://hg.mozilla.org/integration/autoland/rev/bcf78d41751a094c06a70ed9e903be200063fdb9
Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=f29e3aea6a16b57c02a4ca2d84ce19039144782c
Failure log example: https://treeherder.mozilla.org/logviewer?job_id=348249763&repo=autoland
[Parent 1540, Main Thread] WARNING: NS_ENSURE_TRUE(rootFrame) failed: file /builds/worker/checkouts/gecko/dom/base/nsGlobalWindowOuter.cpp:4232
###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
calling context
[stack trace unavailable]
=== Cycle completed at
--- Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
calling context
[stack trace unavailable]
###!!! Deadlock may happen NOW!
[Parent 1540, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
Cyclical dependency starts at
Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
Cycle completed at
Mutex : nsHtml5StreamListener mDelegateMutex (currently acquired)
###!!! Deadlock may happen NOW!
: 'Error', file /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:245
stack-fixing for the first stack frame, this may take a while...
#01: NS_DebugBreak [xpcom/base/nsDebugImpl.cpp:431]
#02: mozilla::BlockingResourceBase::CheckAcquire() [xpcom/threads/BlockingResourceBase.cpp:247]
#03: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:311]
#04: nsHtml5StreamListener::GetDelegate() [parser/html/nsHtml5StreamListener.cpp:51]
#05: nsHtml5Parser::GetChannel(nsIChannel**) [parser/html/nsHtml5Parser.cpp:108]
#06: nsHtml5Parser::StartTokenizer(bool) [parser/html/nsHtml5Parser.cpp:673]
#07: nsHtml5StreamParser::OnStartRequest(nsIRequest*) [parser/html/nsHtml5StreamParser.cpp:1234]
#08: nsHtml5StreamListener::OnStartRequest(nsIRequest*) [parser/html/nsHtml5StreamListener.cpp:68]
#09: nsDocumentOpenInfo::OnStartRequest(nsIRequest*) [uriloader/base/nsURILoader.cpp:166]
#10: mozilla::net::DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*) [netwerk/ipc/DocumentLoadListener.cpp:1296]
#11: mozilla::net::DocumentLoadListener::FinishReplacementChannelSetup(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1206]
#12: mozilla::net::DocumentLoadListener::RedirectToRealChannelFinished(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1147]
#13: mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ThenValue<mozilla::net::DocumentLoadListener::TriggerRedirectToRealChannel(mozilla::Maybe<mozilla::dom::ContentParent*> const&)::$_24, mozilla::net::DocumentLoadListener::TriggerRedirectToRealChannel(mozilla::Maybe<mozilla::dom::ContentParent*> const&)::$_25>::DoResolveOrRejectInternal(mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ResolveOrRejectValue&) [xpcom/threads/MozPromise.h:846]
#14: mozilla::MozPromise<nsresult, mozilla::ipc::ResponseRejectReason, true>::ThenValueBase::ResolveOrRejectRunnable::Run() [xpcom/threads/MozPromise.h:488]
#15: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:503]
#16: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:805]
#17: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:641]
#18: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:425]
#19: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:533]
#20: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1152]
#21: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:466]
#22: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:85]
#23: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:331]
#24: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
#25: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
#26: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:275]
#27: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:5287]
#28: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5479]
#29: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5538]
#30: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x41985]
#31: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x21b97]
#32: ??? [/builds/worker/workspace/build/application/firefox/firefox + 0x41569]
#33: ??? (???:???)
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Let's try a ReentrantMonitor
instead.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=33c459b7f54c28a42afbc402846d44e6afbc9a8d
Assignee | ||
Comment 8•3 years ago
|
||
Comment 9•3 years ago
|
||
Protect nsHtml5StreamListener::mDelegate with a monitor. r=smaug
https://hg.mozilla.org/integration/autoland/rev/8e368aa0f1e545579557f8409705e1c9f72e6ef1
https://hg.mozilla.org/mozilla-central/rev/8e368aa0f1e5
Comment 10•3 years ago
|
||
Please nominate this for Beta & ESR91 approval when you get a chance. It also grafts pretty cleanly to ESR78 (just a minor moz.build conflict), but I'm not sure if it's worth given how close to EOL it is. Feel free to nominate if you think we should take it, though.
Assignee | ||
Comment 11•3 years ago
•
|
||
Comment on attachment 9235978 [details]
Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a monitor.
Beta/Release Uplift Approval Request
- User impact if declined: A use-after-free seems possible under an unlikely thread scheduling combination: If the Necko thread in unscheduled between a null check and object use such that the main thread manages to complete an event loop spin in between.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: Can't provide steps to repro, since the bad scenario depends on very unlikely thread scheduling combination.
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch is low-risk, because it adds a thread synchronization mechanism without changing other code flow. The synchronization primitive should be unable to deadlock with other synchronization mechanism.
- String changes made/needed: None
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Theoretically possible—even if in practice very unlikely—use-after-free.
- User impact if declined: A use-after-free seems possible under an unlikely thread scheduling combination: If the Necko thread in unscheduled between a null check and object use such that the main thread manages to complete an event loop spin in between.
- Fix Landed on Version: 93
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch is low-risk, because it adds a thread synchronization mechanism without changing other code flow. The synchronization primitive should be unable to deadlock with other synchronization mechanism.
- String or UUID changes made by this patch: None
Comment 12•3 years ago
|
||
Comment on attachment 9235978 [details]
Bug 1724101 - Protect nsHtml5StreamListener::mDelegate with a monitor.
Approved for 92.0b4, 91.1esr, and 78.14esr.
Comment 13•3 years ago
|
||
uplift |
Comment 14•3 years ago
|
||
uplift |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•