1141565 - TSan: data race ipc/chromium/src/base/histogram.cc:730 Accumulate

Reporter

Description

•

10 years ago

Attached file imagelib-telemetry-race.txt — Details

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug This is a race that we have always known was a possibility, but now TSan is complaining about it: writing to telemetry histograms from multiple threads. I'm filing this under Imagelib, rather than IPC or even Toolkit::Telemetry, since Imagelib seems like the heaviest user of cross-thread histograms. (Network cache would be another, but I haven't seen cross-thread races there in TSan runs, so perhaps there's no cross-thread access to individual histograms...?) Maybe it's possible to move the histogram accesses all to the main thread? * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Flags: needinfo?(seth)

Nicolas Silva [:nical]

Updated

•

10 years ago

Whiteboard: [tsan] → [tsan][gfx-noted]

Seth Fowler [:seth] [:s2h]

Comment 1

•

10 years ago

Do we spew so much Telemetry data that it'd be a problem to just take a lock or use an atomic on the Telemetry side? ImageLib basically always needs to touch this stuff off-main-thread.

Flags: needinfo?(vdjeric)

Vladan Djeric (:vladan)

Comment 2

•

10 years ago

(In reply to Seth Fowler [:seth] from comment #1) > Do we spew so much Telemetry data that it'd be a problem to just take a lock > or use an atomic on the Telemetry side? ImageLib basically always needs to > touch this stuff off-main-thread. There are tight loops & other hot code that calls Telemetry::Accumulate. We could look into using atomics in Telemetry, and I suspect the overhead of atomics would be acceptable, but we'd have to figure out if any histograms are negatively affected by the extra delay. I don't think it's going to be a quick fix. For any given histogram, the samples should be accumulated on a single thread, it doesn't have to be the main thread. Does ImageLib run on more than two threads in a session?

Flags: needinfo?(vdjeric)

Seth Fowler [:seth] [:s2h]

Comment 3

•

10 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #2) > For any given histogram, the samples should be accumulated on a single > thread, it doesn't have to be the main thread. Does ImageLib run on more > than two threads in a session? Absolutely. Image decoding happens on a thread pool; several threads can be doing it at once. If a quick fix isn't possible on the Telemetry side we can accumulate our Telemetry and report it later on the main thread. This isn't necessarily a huge deal. I'm just trying to avoid extra synchronization with the main thread where possible in ImageLib.

Flags: needinfo?(seth)

Seth Fowler [:seth] [:s2h]

Updated

•

10 years ago

Flags: needinfo?(seth)

Vladan Djeric (:vladan)

Comment 4

•

10 years ago

Nathan, is this the same race condition as in bug 1142079?

Flags: needinfo?(nfroyd)

Vladan Djeric (:vladan)

Updated

•

10 years ago

Comment 5

•

10 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #4) > Nathan, is this the same race condition as in bug 1142079? I don't see that bug 1142079 is necessarily a race condition. Asking questions in the other bug to clarify.

Flags: needinfo?(nfroyd)

imagelib-telemetry-race.txt 10 years ago Nathan Froyd [:froydnj] 15.58 KB, text/plain		Details
WIP patch. 9 years ago Julian Seward [:jseward] 34.47 KB, patch		Details \| Diff \| Splinter Review
Believed to be functionally complete, but needs tidying up. 9 years ago Julian Seward [:jseward] 42.92 KB, patch		Details \| Diff \| Splinter Review
Cleaned up patch 9 years ago Julian Seward [:jseward] 36.11 KB, patch		Details \| Diff \| Splinter Review
Rebased, and with with fixes for review comments per comment 26 9 years ago Julian Seward [:jseward] 29.41 KB, patch	gfritzsche : review+	Details \| Diff \| Splinter Review