Closed
Bug 1298698
Opened 8 years ago
Closed 8 years ago
EnsureNextIteration() may leave MediaStreamGraph asleep
Categories
(Core :: Audio/Video: MediaStreamGraph, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla51
People
(Reporter: jesup, Assigned: jesup)
References
Details
Attachments
(1 file)
3.54 KB,
patch
|
karlt
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1255737 +++ > We are crashing on shutdown for some users due to the async shutdown timeout > abort. That indicates this code can block shutdown for a minute. In examining the stacks after bug 1255737, I don't see any indication that we're waiting on a cubeb shutdown (and we shouldn't be with that stack, though we have reports that some audio drivers (esp on windows) might fail to shut down properly (for Chrome). Thus we must be failing to clear the shutdown tickets elsewhere in the shutdown sequence. I think I found a hole in EnsureNextIteration, but that wouldn't cause the shutdown failures. Here's a rough guide to MSG shutdown. Note that may be multiple graphs, and some will be AudioCallbackDrivers, and others will be SystemClock drivers. Forgive the shorthand... This are the relevant operations shutdown blocker->BlockShutdown() in phase profile-before-shutdown -> ForceShutdown(ticket) ForceShutdown(ticket) lock the monitor mForceShutdown=true remember ticket ensure graph started (note: unlock for Start()) EnsureNextIterationLocked (to process shutdown) OneIteration() (called from ThreadedDriver::RunThread(), or from AudioCallbackDriver::DataCallback()): UpdateMainThreadState() mLifecycleState = WAITING_FOR_MAIN_THREAD_CLEANUP RunInStableState() if mForceShutdown && mState == WAITING_FOR_MAIN_THREAD_CLEANUP move BackMessageQueue to controlMessagesToRun (out of the lock) mLifecycleState = WAITING_FOR_THREAD_SHUTDOWN Send ShutDownRunnable to MainThread for all controlMessagesToRun msg->RunDuringShutdown() mNotDetectedRunning = true ShutDownRunnable::Run() with timer Driver->Shutdown() clear blocking ticket if (IsEmpty()) Destroy() else for all streams stream->NotifyMediaStreamGraphShutdown() mLifecycleState = WAITING_FOR_STREAM_DESTRUCTION // Note: ticket already released; shutdown can continue here!!? // Destroy will be called on later AppendMessage once empty EnsureNextIteration{Locked} mNeedAnotherIteration = true // Atomic if (mGraphDriverAsleep) // Atomic -- note only SystemClock drivers sleep {lock} WakeUp driver // is the default Atomic mode sequentially consistent? Yes, so // mNeedAnotherIteration is guaranteed to be visibly set before testing // asleep. However.... // SystemClockDriver: thread a (Graph thread): (under monitor lock) test mNeedAnotherIteration -> false thread b (MainThread): (no lock) EnsureNextIteration() (Note: not the locked version!) mNeedAnotherIteration = true test mGraphDriverAsleep -> false thread a: (still under monitor lock) mGraphDriverAsleep = true mWaitState = WAITSTATE_WAITING_INDEFINITELY Thread a will stay asleep, since thread b doesn't know to wake it up. I don't think that can be the cause here, since ForceShutdown() uses EnsureNextIterationLocked().
Assignee | ||
Comment 1•8 years ago
|
||
MozReview-Commit-ID: EFfyqYv9Nty
Attachment #8785712 -
Flags: review?(karlt)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•8 years ago
|
||
Adreas, Karl, Paul, jib - please look at the sequence above and see if you see any more holes (and the code; this summary just helps unravel it)
Rank: 10
Flags: needinfo?(pehrson)
Flags: needinfo?(jib)
Comment 3•8 years ago
|
||
Comment on attachment 8785712 [details] [diff] [review] Block race between EnsureNextIteration and WaitForNextIteration Yes, there is a race here and this fixes it, thanks. Interesting that the test on mWaitState when mNeedAnotherIteration was always false, until this change, which now makes it is necessary to keep that block. >+ // we could set mGraphDriverAsleep, so we must re-test it. (EnsureNextIteration >+ // sets mNeedAnotherIteration, then tests mGraphDriverAsleep Need to keep lines to 80 columns and add a period and closing parenthesis at the end. It would be helpful to describe at their definition which threads read and write mNeedAnotherIteration and mGraphDriverAsleep under which conditions, if you can. I think a similar race exists if mGraphDriverAsleep is tested in EnsureNextIteration() immediately after the graph thread has woken spuriously from a WAITSTATE_WAITING_INDEFINITELY Wait(). SystemClockDriver::WakeUp() will then set mWaitState to WAITSTATE_WAKING_UP after the graph thread sets WAITSTATE_RUNNING. This doesn't matter in practice because the only two states of interest are WAITSTATE_WAITING_INDEFINITELY and not WAITSTATE_WAITING_INDEFINITELY. A comment to explain that may be helpful.
Attachment #8785712 -
Flags: review?(karlt) → review+
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c48ae8bb49dc Block race between EnsureNextIteration and WaitForNextIteration r=karlt
Comment 5•8 years ago
|
||
I don't see any holes, but this logic seems brittle and hard to reason about (at least for me). Would it simplify the code to either put these under a lock, or perhaps combine the two booleans (mNeedAnotherIteration and mGraphDriverAsleep) into a single four-state atomic value? Two atoms are no longer atomic.
Flags: needinfo?(jib)
Comment 6•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c48ae8bb49dc
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 7•8 years ago
|
||
I think what jib said seems like a good idea in the short-term. Apart from that I wonder what good reasons there are for not making mGraphDriverAsleep a member of GraphDriver? We might find that it makes things cleaner to more clearly separate concerns of MSG and GraphDriver and make use of higher level constructs for syncing state between them, like Watchable, Canonical, Mirror etc. that are very prevalent in MediaDecoder with friends.
Flags: needinfo?(pehrson)
Assignee | ||
Comment 8•8 years ago
|
||
Probably the primary reason for using Atomics here is that this is a very perf-critical path, and in particular we don't want the Graph blocked on a monitor any more than absolutely necessary (I'd love for it to be not at all in the normal case) -- and if we must block, block once. The ideal would be that the AudioCallbackDriver::DataCallback() wouldn't lock anything in the normal case, and would use lockless circular buffers and Atomics/memory-barriers for all consistency issues. We aren't there yet. A lock around those two atomic refs would add another blocking possibility that could drop the effective priority (Priority Inversion) of the Graph thread down to MainThread/etc's.
Assignee | ||
Updated•8 years ago
|
status-firefox48:
--- → wontfix
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → wontfix
Comment 9•8 years ago
|
||
Making them a unified, atomic, state enum still sounds like a good idea as jib said.
Assignee | ||
Comment 10•8 years ago
|
||
Comment on attachment 8785712 [details] [diff] [review] Block race between EnsureNextIteration and WaitForNextIteration Approval Request Comment [Feature/regressing bug #]: N/A [User impact if declined]: Graphs may fail to wake up when requested. It's unclear the total impact of this, but it could cause general failures of WebAudio and perhaps (maybe) shutdown hangs. [Describe test coverage new/current, TreeHerder]: Well covered in general [Risks and why]: extremely low risk. Want to solve any latent problems (which would be impossible to track back to this), and see also if it helps with AsyncShutdownBlocking hangs. [String/UUID change made/needed]: none
Attachment #8785712 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•8 years ago
|
Comment on attachment 8785712 [details] [diff] [review] Block race between EnsureNextIteration and WaitForNextIteration Might help with shutdown hang situation, worth a try, Aurora50+
Attachment #8785712 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 12•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/d35ac86c5271
You need to log in
before you can comment on or make changes to this bug.
Description
•