Closed Bug 1722721 Opened 3 years ago Closed 7 months ago

Suppress new TSAN failures for WebRender

Categories

(Core :: Graphics: WebRender, task, P3)

Desktop
Linux
task

Tracking

()

RESOLVED INCOMPLETE
92 Branch
Tracking Status
firefox-esr91 --- fixed
firefox92 --- wontfix

People

(Reporter: aosmond, Assigned: aosmond)

References

Details

Attachments

(4 files)

There are a few tsan warnings we need to suppress as they are likely false positives.

This is uninstrumented driver code we pull in using llvmpipe as our rendering backend.

This is a benign race creating worker/SW compositor threads.

Attached file data race, scale_bit

lsalzman indicated this is likely a false positive

Blocks: 1722726
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/051897a0e6e7 Suppress new TSAN failures for WebRender. r=gfx-reviewers,jrmuizel

(In reply to Andrew Osmond [:aosmond] from comment #3)

Created attachment 9233511 [details]
data race, scale_bit

lsalzman indicated this is likely a false positive

TSan should not produce false positives, can you please elaborate why this should be one?

Also, it is preferable to not simply suppress benign data races, we would like to fix all races if possible (the swrast_dri one seems fine).

Flags: needinfo?(aosmond)

(In reply to Christian Holler (:decoder) from comment #6)

(In reply to Andrew Osmond [:aosmond] from comment #3)

Created attachment 9233511 [details]
data race, scale_bit

lsalzman indicated this is likely a false positive

TSan should not produce false positives, can you please elaborate why this should be one?

Lee, can you elaborate on the scale_blit for posterity? I know I've asked a few times about this one :).

Also, it is preferable to not simply suppress benign data races, we would like to fix all races if possible (the swrast_dri one seems fine).

Yep I agree, I've been fixing them preferentially. See bug 1722723, bug 1722725. Some of them are quite tricky though and we are running without WebRender coverage at all right now.

Flags: needinfo?(aosmond) → needinfo?(lsalzman)

Regarding scale_blit, I spent weeks investigating that and could find no legitimate reason for the warning it was giving. It was potentially confused by the band allocation scheme we use for threads, and that's my best guess. Otherwise, I can only conclude it was a false positive.

Flags: needinfo?(lsalzman)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

(In reply to Lee Salzman [:lsalzman] from comment #8)

Regarding scale_blit, I spent weeks investigating that and could find no legitimate reason for the warning it was giving. It was potentially confused by the band allocation scheme we use for threads, and that's my best guess. Otherwise, I can only conclude it was a false positive.

Per our discussion on Matrix I did a try push with instrumentation and it shows overlapping memcpy:

616196:[task 2021-07-29T00:56:15.837Z] 00:56:15     INFO - DBGXXX[1678] memcpy 0x7fbcb5901000 3200           <-- this happens a few hundred memcpys before
616771:[task 2021-07-29T00:56:16.114Z] 00:56:16     INFO - DBGXXX[1512] memcpy 0x7fbcb5901028 1152           <-- this is the memcpy triggering the TSan error
616775:[task 2021-07-29T00:56:16.117Z] 00:56:16     INFO -   Write of size 8 at 0x7fbcb5901028 by thread T25:
616800:[task 2021-07-29T00:56:16.133Z] 00:56:16     INFO -   Previous write of size 8 at 0x7fbcb5901028 by thread T1:

Also note that the threads in question are of different type, one is "Renderer" the other is "SwComposite".

Full log here: https://treeherder.mozilla.org/logviewer?job_id=346670576&repo=try&lineNumber=616892

Status: RESOLVED → REOPENED
Flags: needinfo?(lsalzman)
Resolution: FIXED → ---

As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?

Flags: needinfo?(lsalzman)

(In reply to Lee Salzman [:lsalzman] from comment #11)

As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?

What TSan does is to observe the two overlapping memcpys (from "Renderer" which I assume is the parent? and "SwComposite") without a synchronization of some sort (Atomics/Locks) inbetween them. The fact that the log shows these memcpys in a particular order has no impact on the report, for TSan it only counts that a suitable sync primitive is between them (which is lacking here). Can you point me to the code that ensures that the childs only start when the parent is done?

Flags: needinfo?(lsalzman)

(In reply to Christian Holler (:decoder) from comment #12)

(In reply to Lee Salzman [:lsalzman] from comment #11)

As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?

What TSan does is to observe the two overlapping memcpys (from "Renderer" which I assume is the parent? and "SwComposite") without a synchronization of some sort (Atomics/Locks) inbetween them. The fact that the log shows these memcpys in a particular order has no impact on the report, for TSan it only counts that a suitable sync primitive is between them (which is lacking here). Can you point me to the code that ensures that the childs only start when the parent is done?

https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/compositor/sw_compositor.rs#405

Flags: needinfo?(lsalzman)

Comment on attachment 9233512 [details]
Bug 1722721 - Suppress new TSAN failures for WebRender.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: test only changes to get testing on webrender specifically and turn off non-webrender where reasonable. Please land after patch on Bug 1722239 and Bug 1721945.
  • User impact if declined:
  • Fix Landed on Version: 92
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String or UUID changes made by this patch:
Attachment #9233512 - Flags: approval-mozilla-esr91?

Comment on attachment 9233512 [details]
Bug 1722721 - Suppress new TSAN failures for WebRender.

test-only, approved for esr91

Attachment #9233512 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Status: REOPENED → RESOLVED
Closed: 3 years ago7 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: