Open Bug 1712171 Opened 3 years ago Updated 2 years ago

ThreadSanitizer: data race [@ mozilla::AudioBufferAddWithScale_SSE] vs. [@ ???]

Categories

(Core :: Web Audio, defect, P2)

defect

Tracking

()

Tracking Status
firefox90 --- affected

People

(Reporter: tsmith, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: csectype-race, testcase)

Attachments

(2 files)

Attached file testcase.html

The attached crash information was detected by ThreadSanitizer while using build mozilla-central 20210513-940a3ad12e3d.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

Flags: in-testsuite?

The second stack is not being populated. I'm not sure what's up with that.

The ReverbAccumulationBuffer::m_buffer ringbuffer was allocated for a 23 second impulse response.

The race was caught writing to an offset of 49680 (388.125 blocks of 128, about 1%) from the start of the buffer.

The rendering thread is caught writing to the buffer for one of its (early) convolver stages while the background "ConvolverWorker" thread is writing to the same point. There is no stack for the background thread but it would be writing one of its (latter) convolver stages.

The last rendering thread stage writes at an offset of 80 blocks ahead of the m_accumulationBuffer read index.
The first background thread stage writes at an offset of 96 blocks ahead of the m_accumulationBuffer read index.
If the background thread gets behind, as it will for large enough responses, it can race with the rendering thread because there are no locks on the buffer.

If the background and rendering threads instead used separate accumulation buffers, that would increase the available headroom by a factor of 6. The multiply-add read/writes would no longer race, but the background thread could still get far enough behind to race with the rendering thread read and reset.

If this happens, the output will be corrupted anyway, but perhaps atomics could be used to avoid reading before the write and writing after the reset, or perhaps the background thread could just back-off its write if not safely ahead of the rendering thread.

Synchronization of writes from rendering to background thread is via a ReleaseAcquire atomic.

Synchronization of writes from background to rendering thread is more vague.

The racing reads and writes involve float data, which is not used for any array indexing, so security risk is limited to the theoretical possibility of the compiler using the same memory for other purposes.

Severity: -- → S3
Priority: -- → P2

security risk is limited to the theoretical possibility of the compiler using the same memory for other purposes.

that would be a completely different kind of bug, and we hope detected by our ASAN fuzzing.
Updated: misunderstood what you were saying. Right, a compiler could in theory do something crazy in optimization if it thought no one was going to touch that memory for a bit. ASAN would not detect that. Seems unlikely enough to not be worth hiding the bug in this particular case, especially given the nature of the data.

Group: media-core-security
Blocks: tsan
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: