ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] ([@ mozilla::layers::ImageClientSingle::UpdateImage] vs [@ operator=<mozilla::layers::SyncObjectClient>]
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: decoder, Assigned: mattwoodrow)
References
(Blocks 1 open bug)
Details
(Keywords: csectype-race, sec-high, Whiteboard: [sec-survey][adv-main84+r][adv-esr78.6+r])
Crash Data
Attachments
(3 files)
22.13 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
21.05 KB,
patch
|
jcristau
:
approval-mozilla-esr78+
|
Details | Diff | Splinter Review |
The attached crash information was detected while running CI tests with ThreadSanitizer on try (based on mozilla-central rev dc90a7a18c07).
For detailed crash information, see attachment.
Quick analysis: This is a race on a RefPtr which is potentially dangerous. Marking s-s due to potential use-after-free.
General information about TSan reports
Why fix races?
Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.
Rating
If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.
False Positives / Benign Races
Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].
[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Suppressing unfixable races
If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp
. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.
Reporter | ||
Comment 1•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Hey Jim, Can you find an owner for this one? It's a sec-high. Thanks!
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
Based on the signature, this could also be the root cause of bug 1235665, a crash that has been in the wild for years.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Hey Sotaro, curious if you could take a look here? Might be in your area of expertise.
Assignee | ||
Comment 5•4 years ago
|
||
This is a bit unfortunate.
ImageBridgeChild implements KnowsCompositor, which means that it claims to know details about a specific compositor, even though it's a singleton in the content process (and there might be multiple windows/compositors that render content from this process).
Each time we create a new tab/BrowserChild, we configure the ImageBridgeChild singleton with the details of the compositor for tab, overwriting what was previously there. If all tabs in a process belong to the same window, then it should be a no-op, but if there are multiple windows then it might not be.
Generally all windows have the same type of compositor (but not always!), so the details will usually be roughly the same. The one exception is the sync object, which is unique per compositor. If we configure the ImageBridge using the sync handle from one compositor, then attempts to use the sync handle from content in other compositors will be a no-op (and could cause racy rendering).
I guess the simple fix here might be to remove sync handle functionality from ImageBridge, and to add mutexes around writing/reading this KnowsCompositor data.
The real fix is for ImageBridge to not implement KnowsCompositor at all, and for consumers to find the data they need from the actual compositor connection that they want to render to.
Assignee | ||
Comment 6•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Adding a mutex around a refptr is probably fairly obvious as to the problem, but it might be a bit harder to figure out how to trigger the code.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: All
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Should be easy, this code hasn't changed much.
- How likely is this patch to cause regressions; how much testing does it need?: Very low risk, just adds mutex locks around an infrequent (and not highly contended) bit of initialization code.
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro
sec-approved but please request uplift
Comment 9•4 years ago
|
||
We're about to go into RC week. Would it be alright if we punted on this until the next cycle?
Updated•4 years ago
|
Assignee | ||
Comment 12•4 years ago
|
||
I've rebased the patch in phabricator.
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
This grafts cleanly to Beta but will need a rebased patch and approval request for ESR78 still.
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
Comment on attachment 9188486 [details] [diff] [review]
Path for esr-78
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- User impact if declined:
- Fix Landed on Version: 84
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Comment 18•4 years ago
|
||
Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro
Approved for 84.0b3.
Comment 19•4 years ago
|
||
uplift |
Comment 20•4 years ago
|
||
As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.
Please visit this google form to reply.
Updated•4 years ago
|
Comment 22•3 years ago
|
||
Comment on attachment 9188486 [details] [diff] [review]
Path for esr-78
approved for 78.6esr
Comment 23•3 years ago
|
||
uplift |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Description
•