Suppress new TSAN failures for WebRender
Categories
(Core :: Graphics: WebRender, task, P3)
Tracking
()
People
(Reporter: aosmond, Assigned: aosmond)
References
Details
Attachments
(4 files)
There are a few tsan warnings we need to suppress as they are likely false positives.
Assignee | ||
Comment 1•3 years ago
|
||
This is uninstrumented driver code we pull in using llvmpipe as our rendering backend.
Assignee | ||
Comment 2•3 years ago
|
||
This is a benign race creating worker/SW compositor threads.
Assignee | ||
Comment 3•3 years ago
|
||
lsalzman indicated this is likely a false positive
Assignee | ||
Comment 4•3 years ago
|
||
Comment 6•3 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #3)
Created attachment 9233511 [details]
data race, scale_bitlsalzman indicated this is likely a false positive
TSan should not produce false positives, can you please elaborate why this should be one?
Also, it is preferable to not simply suppress benign data races, we would like to fix all races if possible (the swrast_dri one seems fine).
Assignee | ||
Comment 7•3 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #6)
(In reply to Andrew Osmond [:aosmond] from comment #3)
Created attachment 9233511 [details]
data race, scale_bitlsalzman indicated this is likely a false positive
TSan should not produce false positives, can you please elaborate why this should be one?
Lee, can you elaborate on the scale_blit for posterity? I know I've asked a few times about this one :).
Also, it is preferable to not simply suppress benign data races, we would like to fix all races if possible (the swrast_dri one seems fine).
Yep I agree, I've been fixing them preferentially. See bug 1722723, bug 1722725. Some of them are quite tricky though and we are running without WebRender coverage at all right now.
Comment 8•3 years ago
|
||
Regarding scale_blit, I spent weeks investigating that and could find no legitimate reason for the warning it was giving. It was potentially confused by the band allocation scheme we use for threads, and that's my best guess. Otherwise, I can only conclude it was a false positive.
Comment 9•3 years ago
|
||
bugherder |
Comment 10•3 years ago
|
||
(In reply to Lee Salzman [:lsalzman] from comment #8)
Regarding scale_blit, I spent weeks investigating that and could find no legitimate reason for the warning it was giving. It was potentially confused by the band allocation scheme we use for threads, and that's my best guess. Otherwise, I can only conclude it was a false positive.
Per our discussion on Matrix I did a try push with instrumentation and it shows overlapping memcpy:
616196:[task 2021-07-29T00:56:15.837Z] 00:56:15 INFO - DBGXXX[1678] memcpy 0x7fbcb5901000 3200 <-- this happens a few hundred memcpys before
616771:[task 2021-07-29T00:56:16.114Z] 00:56:16 INFO - DBGXXX[1512] memcpy 0x7fbcb5901028 1152 <-- this is the memcpy triggering the TSan error
616775:[task 2021-07-29T00:56:16.117Z] 00:56:16 INFO - Write of size 8 at 0x7fbcb5901028 by thread T25:
616800:[task 2021-07-29T00:56:16.133Z] 00:56:16 INFO - Previous write of size 8 at 0x7fbcb5901028 by thread T1:
Also note that the threads in question are of different type, one is "Renderer" the other is "SwComposite".
Full log here: https://treeherder.mozilla.org/logviewer?job_id=346670576&repo=try&lineNumber=616892
Updated•3 years ago
|
Comment 11•3 years ago
|
||
As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?
Comment 12•3 years ago
|
||
(In reply to Lee Salzman [:lsalzman] from comment #11)
As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?
What TSan does is to observe the two overlapping memcpys (from "Renderer" which I assume is the parent? and "SwComposite") without a synchronization of some sort (Atomics/Locks) inbetween them. The fact that the log shows these memcpys in a particular order has no impact on the report, for TSan it only counts that a suitable sync primitive is between them (which is lacking here). Can you point me to the code that ensures that the childs only start when the parent is done?
Comment 13•3 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #12)
(In reply to Lee Salzman [:lsalzman] from comment #11)
As far as I can see in that log, everything looks well-ordered. There is dependency tracking on which surfaces get composited into the destination framebuffer. These source surfaces may overlap, but the child surfaces only get composited once the parent surface is fully composited. This dependency tracking is probably just confusing TSAN?
What TSan does is to observe the two overlapping memcpys (from "Renderer" which I assume is the parent? and "SwComposite") without a synchronization of some sort (Atomics/Locks) inbetween them. The fact that the log shows these memcpys in a particular order has no impact on the report, for TSan it only counts that a suitable sync primitive is between them (which is lacking here). Can you point me to the code that ensures that the childs only start when the parent is done?
https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/compositor/sw_compositor.rs#405
Comment 14•3 years ago
|
||
Comment on attachment 9233512 [details]
Bug 1722721 - Suppress new TSAN failures for WebRender.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: test only changes to get testing on webrender specifically and turn off non-webrender where reasonable. Please land after patch on Bug 1722239 and Bug 1721945.
- User impact if declined:
- Fix Landed on Version: 92
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Comment 15•3 years ago
|
||
Comment on attachment 9233512 [details]
Bug 1722721 - Suppress new TSAN failures for WebRender.
test-only, approved for esr91
Comment 16•3 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Updated•7 months ago
|
Description
•