Closed Bug 1674773 Opened 11 months ago Closed 11 months ago

ThreadSanitizer: data race [@ moz_task::TaskRunnable::Release] vs. [@ nsThreadPool::Run]

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: Gankra, Assigned: Gankra)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

I am seeing this in my WIP branch to properly instrument the rust stdlib.

The race involves this task and this runnable. It's possible this is just a non-atomic refcounting issue somewhere, but I'm not confident. Things are a bit too virtual/macro-y to have a clear picture.

Marked as security since this is ostensibly a potential Use-After-Free involving a lot of virtual methods.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

Group: core-security → dom-core-security

I remember us discussing a while ago that tsan doesn't handle atomic fences super well. I wonder if the cause of this issue is actually the Acquire fence in the xpcom atomic refcounting code here: https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/xpcom/rust/xpcom/src/refptr.rs#301-302

Our C++ implementation of atomic release actually has config options to perform a less efficient operation because tsan doesn't understand the fence operation: https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/xpcom/base/nsISupportsImpl.h#363-374. As does the rust standard library: https://github.com/rust-lang/rust/blob/338f939a8d77061896cd0a1ca87a2c6d1f4ec359/library/alloc/src/sync.rs#L41-L56.

I think the fix to this issue might just be to perform a similar check within refptr.rs.

(In reply to Nika Layzell [:nika] (ni? for response) from comment #1)

I think the fix to this issue might just be to perform a similar check within refptr.rs.

Thanks a lot Nika, I believe this is exactly what we have been looking for!

Alexis, do you feel comfortable for attempting a fix here?

Flags: needinfo?(a.beingessner)

Yeah should be fine,

Assignee: nobody → a.beingessner
Flags: needinfo?(a.beingessner)

This is TSan not understanding our synchronization, apparently, so I'm unhiding.

Group: dom-core-security

Oh boy, the cfg(sanitize="thread") check is behind an unstable feature(cfg_sanitize), time to do some messy nonsense.

This creates a more principled way for us to set all the things our rust
code needs for tsan. In the future, this will also be used to set -Zbuild-std
and the infra it needs. In this patch set, it's needed to set a feature
flag on the xpcom crate.

This makes --enable-thread-sanitizer turn on Rust tsan (-Zsanitizer=thread).
This requires changing SpiderMonkey tsan to use the tsan rust nightly.

In future changes, more Rust tsan integration will key off of MOZ_TSAN.

Attachment #9185885 - Attachment is obsolete: true
Pushed by abeingessner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/89babec1b885
make tsan builds use consistent flags. r=rstewart
https://hg.mozilla.org/integration/autoland/rev/acd441508a66
Use the dummy-load-as-a-fence trick for refptr.rs. r=nika
Backout by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/65bcbd8c456a
Backed out 2 changesets for turning Bug 1646925 into almost permafail.
Backout by smolnar@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/5942060b993b
Backed out 2 changesets for turning Bug 1646925 into almost permafail.

These already-flakey tests seem to get pushed over the edge to "almost always
busted" with more instrumented Rust code. Disabling seems like the right approach
to get more coverage elsewhere (and the other similarly-flakey tests suggest
we're probably still hitting the root cause).

Depends on D95950

Pushed by abeingessner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/378d2e32026b
make tsan builds use consistent flags. r=rstewart
https://hg.mozilla.org/integration/autoland/rev/342247a735ac
Use the dummy-load-as-a-fence trick for refptr.rs. r=nika
https://hg.mozilla.org/integration/autoland/rev/0644208f1414
Disable a few more tests under tsan. r=decoder
Flags: needinfo?(a.beingessner)
You need to log in before you can comment on or make changes to this bug.