ThreadSanitizer: lock-order-inversion (potential deadlock) [@ PR_Lock] in SanctionsTestServer
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
People
(Reporter: decoder, Assigned: decoder)
References
(Blocks 1 open bug)
Details
(Whiteboard: [psm-assigned])
Attachments
(2 files)
The attached crash information was detected while running CI tests with ThreadSanitizer on mozilla-central revision ee653474267a.
This potential deadlock report refers to the SanctionsTestServer
used in one XPCShell test. Filing this so we are aware of it and I will suppress the issue for now.
General information about TSan reports
Why fix races?
Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.
Rating
If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.
False Positives / Benign Races
Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].
[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Suppressing unfixable races
If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp
. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.
Assignee | ||
Comment 1•4 years ago
|
||
Assignee | ||
Comment 2•4 years ago
|
||
It looks like OCSPStaplingServer
has the same problem and the test using that also has intermittent failures (not sure if these are related).
Comment 3•4 years ago
|
||
This is basically that DoSNISocketConfig
in both utilities needs mutual exclusion -- at least as ConfigSecureServerWithNamedCert
is designed, since it's reusing the internal slot.
I'm pretty sure this is a false positive. These servers only run on a single thread (the other thread just polls the parent to see if it has died - it doesn't touch any NSS stuff). Going by comment 0 this warning has already been suppressed, so I'm going to close this wfm.
Assignee | ||
Comment 5•4 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #4)
I'm pretty sure this is a false positive. These servers only run on a single thread (the other thread just polls the parent to see if it has died - it doesn't touch any NSS stuff). Going by comment 0 this warning has already been suppressed, so I'm going to close this wfm.
TSan does not produce these types of false positive to our knowledge. Suppressions are temporary to get the tests to work but they are not a permanent solution (they cost performance).
:jcj seems to have found the problem in comment 3 and that should be addressed.
If you still believe this to be a false positive, then we need to find the source of the false positive and fix it, because we can't afford this tool to produce false positives in the first place. We can also run this test with TSAN_OPTIONS=second_deadlock_stack=1
for a better report.
Comment 6•4 years ago
|
||
Dana's right of course, this is single-threaded, so even with lock-order inversion it's not a race. Besides which there's no race in the Firefox code, this is a mochitest service which was identified.
Mutual exclusion into DoSNISocketConfig could still be done, but it's literally single-threaded, so it'd just be to satisfy TSAN.
I agree, re-resolving WORKSFORME.
For future reference, this appears to be a known issue: https://github.com/google/sanitizers/issues/488 (thanks :kjacobs!)
Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #7)
For future reference, this appears to be a known issue: https://github.com/google/sanitizers/issues/488 (thanks :kjacobs!)
Thanks for figuring this out, this is valuable for us to know! I will also reach out to Dmitry and see if there is any progress on the issue.
(In reply to J.C. Jones [:jcj] (he/him) from comment #6)
I agree, re-resolving WORKSFORME.
We still can't just close the bug. Suppressions are temporary unless they are marked explicitly permanent with a reason (otherwise, automating which suppressions are obsolete becomes challenging). I will make the necessary changes and flag you for review.
Assignee | ||
Comment 9•4 years ago
|
||
Comment 10•4 years ago
|
||
The priority flag is not set for this bug.
:keeler, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 11•4 years ago
|
||
Pushed by choller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8924ccb67af0 Add permanent deadlock suppressions for single thread. r=keeler
Comment 12•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•