Closed Bug 1664831 Opened 4 years ago Closed 4 years ago

ThreadSanitizer: data race [@ assign_assuming_AddRef] vs. [@ get] ([@ mozilla::layers::ImageClientSingle::UpdateImage] vs [@ operator=<mozilla::layers::SyncObjectClient>]

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 84+ fixed
firefox81 --- wontfix
firefox82 --- wontfix
firefox83 --- wontfix
firefox84 + fixed
firefox85 + fixed

People

(Reporter: decoder, Assigned: mattwoodrow)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-race, sec-high, Whiteboard: [sec-survey][adv-main84+r][adv-esr78.6+r])

Crash Data

Attachments

(3 files)

The attached crash information was detected while running CI tests with ThreadSanitizer on try (based on mozilla-central rev dc90a7a18c07).

For detailed crash information, see attachment.

Quick analysis: This is a race on a RefPtr which is potentially dangerous. Marking s-s due to potential use-after-free.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

Group: core-security → gfx-core-security

Hey Jim, Can you find an owner for this one? It's a sec-high. Thanks!

Flags: needinfo?(jmathies)
Blocks: gfx-triage
Flags: needinfo?(jmathies)
Crash Signature: [@ mozilla::layers::ImageClientSingle::UpdateImage] [@ mozilla::layers::KnowsCompositor::IdentifyTextureHost]

Based on the signature, this could also be the root cause of bug 1235665, a crash that has been in the wild for years.

Hey Sotaro, curious if you could take a look here? Might be in your area of expertise.

Flags: needinfo?(sotaro.ikeda.g)

This is a bit unfortunate.

ImageBridgeChild implements KnowsCompositor, which means that it claims to know details about a specific compositor, even though it's a singleton in the content process (and there might be multiple windows/compositors that render content from this process).

Each time we create a new tab/BrowserChild, we configure the ImageBridgeChild singleton with the details of the compositor for tab, overwriting what was previously there. If all tabs in a process belong to the same window, then it should be a no-op, but if there are multiple windows then it might not be.

Generally all windows have the same type of compositor (but not always!), so the details will usually be roughly the same. The one exception is the sync object, which is unique per compositor. If we configure the ImageBridge using the sync handle from one compositor, then attempts to use the sync handle from content in other compositors will be a no-op (and could cause racy rendering).

I guess the simple fix here might be to remove sync handle functionality from ImageBridge, and to add mutexes around writing/reading this KnowsCompositor data.

The real fix is for ImageBridge to not implement KnowsCompositor at all, and for consumers to find the data they need from the actual compositor connection that they want to render to.

Attached file Bug 1664831. r?sotaro
Assignee: nobody → matt.woodrow
Status: NEW → ASSIGNED

Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Adding a mutex around a refptr is probably fairly obvious as to the problem, but it might be a bit harder to figure out how to trigger the code.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: All
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Should be easy, this code hasn't changed much.
  • How likely is this patch to cause regressions; how much testing does it need?: Very low risk, just adds mutex locks around an infrequent (and not highly contended) bit of initialization code.
Attachment #9183379 - Flags: sec-approval?

Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro

sec-approved but please request uplift

Attachment #9183379 - Flags: sec-approval?
Attachment #9183379 - Flags: sec-approval+
Attachment #9183379 - Flags: approval-mozilla-beta?

We're about to go into RC week. Would it be alright if we punted on this until the next cycle?

Flags: needinfo?(tom)

Yes

Flags: needinfo?(tom)

This needs rebasing before it can land.

Flags: needinfo?(matt.woodrow)

I've rebased the patch in phabricator.

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(matt.woodrow)

This grafts cleanly to Beta but will need a rebased patch and approval request for ESR78 still.

Flags: needinfo?(matt.woodrow)
Group: gfx-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
Attached patch Path for esr-78Splinter Review
Flags: needinfo?(matt.woodrow)

Comment on attachment 9188486 [details] [diff] [review]
Path for esr-78

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined:
  • Fix Landed on Version: 84
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String or UUID changes made by this patch:
Attachment #9188486 - Attachment is patch: true
Attachment #9188486 - Attachment mime type: application/octet-stream → text/plain
Attachment #9188486 - Flags: approval-mozilla-esr78?

Comment on attachment 9183379 [details]
Bug 1664831. r?sotaro

Approved for 84.0b3.

Attachment #9183379 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(matt.woodrow)
Whiteboard: [sec-survey]

Done!

Flags: needinfo?(matt.woodrow)
Flags: qe-verify-

Comment on attachment 9188486 [details] [diff] [review]
Path for esr-78

approved for 78.6esr

Attachment #9188486 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78+
Whiteboard: [sec-survey] → [sec-survey][adv-main84+r]
Whiteboard: [sec-survey][adv-main84+r] → [sec-survey][adv-main84+r][adv-esr78.6+r]
No longer blocks: gfx-triage
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: