Closed Bug 1203585 Opened 6 years ago Closed 5 years ago

TSan: data race dom/media/GraphDriver.h:97 Switching()

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: froydnj, Assigned: padenot)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(8 files)

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

This is a straightforward race where we're writing GraphDriver::mPreviousDriver on an MSG thread (?) and we're reading it on a thread ALSA has set up for us.  There does not seem to be any locking around the read or write.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Component: Audio/Video → Audio/Video: MSG/cubeb/GMP
Assignee: nobody → padenot
Attached file TSan stack trace
Paul pointed out things work better when log files actually get attached.
Comment on attachment 8659936 [details] [diff] [review]
Lock around access to mNextDriver and mPreviousDriver. r=

r-: NextDriver() accesses mNextDriver without the monitor, and it's not clear it's called with the monitor locked - if so, it should be documented (and checked) and perhaps asserted.

This points out that the documentation of what's covered by the monitor and what's not is inconsistent.  mPromisesForOperation appears to need the monitor, but doesn't have "This is synchronized by the graph's Monitor".  Apparently mNextDriver and mPreviousDriver need that statement too.  SwitchAtNextIteration() should document it needs the monitor too (and appears to always have it).

Most uses of mNextDriver are tied to mGraphImpl->SetCurrentDriver(), which likely does need the monitor.  It's not clear that all cases do, especially calls to Switching().  This makes me wonder if there are lower-cost alternatives (Atomics?), though the need for Switching to check mNextDriver and mPreviousDriver implies some form of lock/semaphore/monitor is needed - but it also appears there may be more unlocked references to mPreviousDriver.

So I think more cleanup and especially more documentation/checking is needed.  Perhaps it just needs documentation and/or an assert or two.
Attachment #8659936 - Flags: review?(rjesup) → review-
Should we try to update and land this?  Lack of locking on access to these could cause "interesting" problems even without help of TSAN compiler weirdness.

If we're sure it's safe outside of the compiler doing an evil TSAN trick, we can reduce the priority.
Rank: 19
Flags: needinfo?(padenot)
Priority: -- → P1
Bug 1203585 - Add a comment block on how MediaStreamGraph switch GraphDrivers. r?jesup
Attachment #8694180 - Flags: review?(rjesup)
Bug 1203585 - Add new methods to GraphDriver to assert that locks are held. r?jesup
Attachment #8694181 - Flags: review?(rjesup)
Bug 1203585 - Add threading assertions to GraphDriver switching methods. r?jesup
Attachment #8694182 - Flags: review?(rjesup)
Bug 1203585 - Update the MediaStreamGraph code to lock properly. r?jesup
Attachment #8694183 - Flags: review?(rjesup)
Bug 1203585 - Add comments about threading and locking on GraphDriver's members. r?jesup
Attachment #8694184 - Flags: review?(rjesup)
Bug 1203585 - Remove some dead code in GraphDriver.cpp. r?jesup
Attachment #8694185 - Flags: review?(rjesup)
Comment on attachment 8694180 [details]
MozReview Request: Bug 1203585 - Add a comment block on how MediaStreamGraph switch GraphDrivers. r?jesup

https://reviewboard.mozilla.org/r/26685/#review24305

::: dom/media/GraphDriver.h:66
(Diff revision 1)
> + * scenarii that can happe:

nit: happen

::: dom/media/GraphDriver.h:72
(Diff revision 1)
> + * - The thread T is posting a message to the main thread to terminate itself.

nit: Threat T posts a message

::: dom/media/GraphDriver.h:81
(Diff revision 1)
> + * - At the end of the MSG iteration, the SystemClockDriver transfers its timing

Is mNextDriver used here?

::: dom/media/GraphDriver.h:86
(Diff revision 1)
> + *   thread to shut it down.

it -> the SystemClockDriver thread

::: dom/media/GraphDriver.h:93
(Diff revision 1)
> + *   starts the SystemClockDriver. This starts a thread.

This starts a thread -> A new SystemClockDriver thread is started from the current audio thread.

It may make sense to scrub this to have consistent names for the threads.

::: dom/media/GraphDriver.h:101
(Diff revision 1)
> + * of the different attributes of drivers, and they access pattern is documented

nit: they->their
Attachment #8694180 - Flags: review?(rjesup) → review+
Comment on attachment 8694181 [details]
MozReview Request: Bug 1203585 - Add new methods to GraphDriver to assert that locks are held. r?jesup

https://reviewboard.mozilla.org/r/26687/#review24309

::: dom/media/GraphDriver.cpp:199
(Diff revision 1)
> +    GraphDriver* previousDriver = nullptr;

= nullptr is spurious here.  up to you though

::: dom/media/GraphDriver.cpp:219
(Diff revision 1)
> +        mDriver->SetPreviousDriver(nullptr);

Is there any chance something could run and want to mess with mDriver's PreviousDriver value while we're in this Run()?  Would it make sense to simply hold the lock longer in this function?
Attachment #8694181 - Flags: review?(rjesup) → review+
Comment on attachment 8694182 [details]
MozReview Request: Bug 1203585 - Add threading assertions to GraphDriver switching methods. r?jesup

https://reviewboard.mozilla.org/r/26689/#review24315
Attachment #8694182 - Flags: review?(rjesup) → review+
Comment on attachment 8694183 [details]
MozReview Request: Bug 1203585 - Update the MediaStreamGraph code to lock properly. r?jesup

https://reviewboard.mozilla.org/r/26691/#review24317
Attachment #8694183 - Flags: review?(rjesup) → review+
Comment on attachment 8694184 [details]
MozReview Request: Bug 1203585 - Add comments about threading and locking on GraphDriver's members. r?jesup

https://reviewboard.mozilla.org/r/26693/#review24319

::: dom/media/GraphDriver.h:251
(Diff revision 1)
> +  // durining the initialization.

during

::: dom/media/GraphDriver.h:257
(Diff revision 1)
> +  // This written to when changing driver, from the previous driver's thread, or

nit: This is written

::: dom/media/GraphDriver.h:259
(Diff revision 1)
> +  // and read each time we need to check whether we're changing driver (in

"At this point and read each time" - reword
Attachment #8694184 - Flags: review?(rjesup) → review+
Comment on attachment 8694185 [details]
MozReview Request: Bug 1203585 - Remove some dead code in GraphDriver.cpp. r?jesup

https://reviewboard.mozilla.org/r/26695/#review24321
Attachment #8694185 - Flags: review?(rjesup) → review+
Clearing NI.
Flags: needinfo?(padenot)
AudioCallbackDriver::Start() didn't reset mPreviousDriver until after calling
Init().
https://hg.mozilla.org/mozilla-central/annotate/61079e59ef35/dom/media/GraphDriver.cpp#l627

Init() called(/calls) AudioCallbackDriver::StartStream(), which set(s) mStarted
after cubeb_stream_start().

cubeb_stream_start() triggered(/triggers) MediaStreamGraphImpl::Process().

MediaStreamGraphImpl::Process() was testing Switching(),
https://hg.mozilla.org/mozilla-central/annotate/61079e59ef35/dom/media/MediaStreamGraph.cpp#l1441

Switching() is/was mNextDriver || mPreviousDriver.

The race was later fixed when
https://hg.mozilla.org/mozilla-central/rev/ef761a32a969178df81a61bc2d0a376dde2eec03#l1.160
moved the reset of mPreviousDriver from after Init() to before Init(). 

I expect that made the monitor unnecessary (and I doubt the locking around the Switching()
call was helpful).

(https://hg.mozilla.org/mozilla-central/rev/ef761a32a969178df81a61bc2d0a376dde2eec03#l1.309
 also moved the mixer RemoveCallback() from MediaStreamGraphImpl::Process() to
 AudioCallbackDriver::DataCallback() et al, where the driver switch is
 performed.)
You need to log in before you can comment on or make changes to this bug.