Closed Bug 1298698 Opened 3 years ago Closed 3 years ago

EnsureNextIteration() may leave MediaStreamGraph asleep

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1, critical)

44 Branch
Unspecified
All
defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

Details

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1255737 +++

> We are crashing on shutdown for some users due to the async shutdown timeout 
> abort.  That indicates this code can block shutdown for a minute.

In examining the stacks after bug 1255737, I don't see any indication that we're waiting on a cubeb shutdown (and we shouldn't be with that stack, though we have reports that some audio drivers (esp on windows) might fail to shut down properly (for Chrome).

Thus we must be failing to clear the shutdown tickets elsewhere in the shutdown sequence.  I think I found a hole in EnsureNextIteration, but that wouldn't cause the shutdown failures.

Here's a rough guide to MSG shutdown.  Note that may be multiple graphs, and some will be AudioCallbackDrivers, and others will be SystemClock drivers.
Forgive the shorthand...  This are the relevant operations


shutdown blocker->BlockShutdown()
 in phase profile-before-shutdown
 -> ForceShutdown(ticket)

ForceShutdown(ticket)
  lock the monitor
  mForceShutdown=true
  remember ticket
  ensure graph started (note: unlock for Start())
  EnsureNextIterationLocked (to process shutdown)

OneIteration() (called from ThreadedDriver::RunThread(), or from
AudioCallbackDriver::DataCallback()):
  UpdateMainThreadState()
    mLifecycleState = WAITING_FOR_MAIN_THREAD_CLEANUP

RunInStableState()
  if mForceShutdown && mState == WAITING_FOR_MAIN_THREAD_CLEANUP
    move BackMessageQueue to controlMessagesToRun (out of the lock)
    mLifecycleState = WAITING_FOR_THREAD_SHUTDOWN
    Send ShutDownRunnable to MainThread
  for all controlMessagesToRun
    msg->RunDuringShutdown()
  mNotDetectedRunning = true

ShutDownRunnable::Run()
  with timer
  Driver->Shutdown()
  clear blocking ticket
  if (IsEmpty())
    Destroy()
  else
    for all streams
      stream->NotifyMediaStreamGraphShutdown()
    mLifecycleState = WAITING_FOR_STREAM_DESTRUCTION
    // Note: ticket already released; shutdown can continue here!!?
    // Destroy will be called on later AppendMessage once empty


EnsureNextIteration{Locked}
  mNeedAnotherIteration = true // Atomic
  if (mGraphDriverAsleep) // Atomic -- note only SystemClock drivers sleep
     {lock}
     WakeUp driver

// is the default Atomic mode sequentially consistent? Yes, so
// mNeedAnotherIteration is guaranteed to be visibly set before testing
// asleep.  However....

// SystemClockDriver:
thread a (Graph thread): (under monitor lock)
  test mNeedAnotherIteration -> false

thread b (MainThread): (no lock)
  EnsureNextIteration() (Note: not the locked version!)
    mNeedAnotherIteration = true
    test mGraphDriverAsleep -> false

thread a: (still under monitor lock)
  mGraphDriverAsleep = true
  mWaitState = WAITSTATE_WAITING_INDEFINITELY

Thread a will stay asleep, since thread b doesn't know to wake it up.

I don't think that can be the cause here, since ForceShutdown() uses EnsureNextIterationLocked().
MozReview-Commit-ID: EFfyqYv9Nty
Attachment #8785712 - Flags: review?(karlt)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Adreas, Karl, Paul, jib - please look at the sequence above and see if you see any more holes (and the code; this summary just helps unravel it)
Rank: 10
Flags: needinfo?(pehrson)
Flags: needinfo?(jib)
Comment on attachment 8785712 [details] [diff] [review]
Block race between EnsureNextIteration and WaitForNextIteration

Yes, there is a race here and this fixes it, thanks.

Interesting that the test on mWaitState when mNeedAnotherIteration was always
false, until this change, which now makes it is necessary to keep that block.

>+  // we could set mGraphDriverAsleep, so we must re-test it.  (EnsureNextIteration
>+  // sets mNeedAnotherIteration, then tests mGraphDriverAsleep

Need to keep lines to 80 columns and add a period and closing parenthesis at the end.

It would be helpful to describe at their definition which threads read and
write mNeedAnotherIteration and mGraphDriverAsleep under which conditions, if
you can.

I think a similar race exists if mGraphDriverAsleep is tested in
EnsureNextIteration() immediately after the graph thread has woken spuriously
from a WAITSTATE_WAITING_INDEFINITELY Wait().
SystemClockDriver::WakeUp() will then set mWaitState to WAITSTATE_WAKING_UP
after the graph thread sets WAITSTATE_RUNNING.  This doesn't matter in
practice because the only two states of interest are
WAITSTATE_WAITING_INDEFINITELY and not WAITSTATE_WAITING_INDEFINITELY.  A
comment to explain that may be helpful.
Attachment #8785712 - Flags: review?(karlt) → review+
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c48ae8bb49dc
Block race between EnsureNextIteration and WaitForNextIteration r=karlt
I don't see any holes, but this logic seems brittle and hard to reason about (at least for me). Would it simplify the code to either put these under a lock, or perhaps combine the two booleans (mNeedAnotherIteration and mGraphDriverAsleep) into a single four-state atomic value? Two atoms are no longer atomic.
Flags: needinfo?(jib)
https://hg.mozilla.org/mozilla-central/rev/c48ae8bb49dc
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
I think what jib said seems like a good idea in the short-term.

Apart from that I wonder what good reasons there are for not making mGraphDriverAsleep a member of GraphDriver?

We might find that it makes things cleaner to more clearly separate concerns of MSG and GraphDriver and make use of higher level constructs for syncing state between them, like Watchable, Canonical, Mirror etc. that are very prevalent in MediaDecoder with friends.
Flags: needinfo?(pehrson)
Probably the primary reason for using Atomics here is that this is a very perf-critical path, and in particular we don't want the Graph blocked on a monitor any more than absolutely necessary (I'd love for it to be not at all in the normal case) -- and if we must block, block once.  

The ideal would be that the AudioCallbackDriver::DataCallback() wouldn't lock anything in the normal case, and would use lockless circular buffers and Atomics/memory-barriers for all consistency issues.  We aren't there yet.

A lock around those two atomic refs would add another blocking possibility that could drop the effective priority (Priority Inversion) of the Graph thread down to MainThread/etc's.
Making them a unified, atomic, state enum still sounds like a good idea as jib said.
Comment on attachment 8785712 [details] [diff] [review]
Block race between EnsureNextIteration and WaitForNextIteration

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: Graphs may fail to wake up when requested.  It's unclear the total impact of this, but it could cause general failures of WebAudio and perhaps (maybe) shutdown hangs.

[Describe test coverage new/current, TreeHerder]: Well covered in general

[Risks and why]: extremely low risk.  Want to solve any latent problems (which would be impossible to track back to this), and see also if it helps with AsyncShutdownBlocking hangs.

[String/UUID change made/needed]: none
Attachment #8785712 - Flags: approval-mozilla-aurora?
Comment on attachment 8785712 [details] [diff] [review]
Block race between EnsureNextIteration and WaitForNextIteration

Might help with shutdown hang situation, worth a try, Aurora50+
Attachment #8785712 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.