Prototype state-watching machinery to reduce notification bugs in media code

RESOLVED FIXED in Firefox 40

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

unspecified
mozilla40
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

Attachments

(5 attachments, 2 obsolete attachments)

Another common class of media race condition occurs when a value gets updated, but that update doesn't get processed. For example, we've seen various situation where a piece of state gets updated in the state machine, but we neglect to trigger HTMLMediaElement::UpdateReadyStateForData. This leads to missing events, inconsistent data, etc - and often happens intermittently.

I have a rough strategy to handle this with explicitly-watchable state. To work properly we'll first need the state mirroring bug 1144481, so I'm going to do that first.
Blocks: MediaMonitor
My prototype usage of this for ready state updating is mostly green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=550d7fe11ec9

As usual, there are some intermittent android/b2g failures. Trying to get to the bottom of them:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f62f2e09072
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e92a133a7684
Depends on: 1154802
cajbir said we should do this in the media meeting.
Attachment #8594403 - Flags: review?(cpearce)
Attachment #8594404 - Flags: review?(cpearce)
Attachment #8594405 - Flags: review?(cpearce)
(State mirroring stuff is coming in bug 1144481, just have one more failure to look at).
Comment on attachment 8594403 [details] [diff] [review]
Part 1 - Fix Bogus test. v1

Review of attachment 8594403 [details] [diff] [review]:
-----------------------------------------------------------------

I think I recall cajbir saying, but best to let him confirm.
Attachment #8594403 - Flags: review?(cpearce) → review?(cajbir.bugzilla)
Attachment #8594403 - Flags: review?(cajbir.bugzilla) → review+
Comment on attachment 8594405 [details] [diff] [review]
Part 3 - Implement state watching machinery. v1

Review of attachment 8594405 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/StateWatching.h
@@ +91,5 @@
> + * WatchTarget is a superclass from which all watchable things must inherit.
> + * Unlike AbstractWatcher, it is a fully-implemented Mix-in, and the subclass
> + * needs only to invoke NotifyWatchers when something changes.
> + */
> +class WatchTarget

Unlike MediaPromise, this class is not thread-safe. Need some comments to note the client code which is most likely our media code.

@@ +197,5 @@
> +    nsCOMPtr<nsIRunnable> r = NS_NewRunnableMethod(this, &Watcher::FireCallbacks);
> +    AbstractThread::GetCurrent()->TailDispatcher().AddDirectTask(r.forget());
> +
> +    // Notify dependent watchers.
> +    NotifyWatchers();

I am worried we have 2 inconsistent ways to notify the subscribers of Watcher. One is weak callback and the other is via AbstractWatcher::Notify. For example, if Watcher::Notify is called multiple times in an event cycle (as far as AbstractThread is concerned), the weak callback will fire only once however AbstractWatcher::Notify will be called multiple times. I would prefer a notification model that is consistent with that of MediaPromise where resove/reject will notify it consumers in an async way.

@@ +222,5 @@
> +    }
> +  }
> +
> +private:
> +  class AbstractMethodCall

Doesn't NS_NewNonOwningRunnableMethod serve the same purpose?
Comment on attachment 8594405 [details] [diff] [review]
Part 3 - Implement state watching machinery. v1

Review of attachment 8594405 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/StateWatching.h
@@ +197,5 @@
> +    nsCOMPtr<nsIRunnable> r = NS_NewRunnableMethod(this, &Watcher::FireCallbacks);
> +    AbstractThread::GetCurrent()->TailDispatcher().AddDirectTask(r.forget());
> +
> +    // Notify dependent watchers.
> +    NotifyWatchers();

Oops. AbstractWatcher::Notify won't be called multiple times for there is |mNotifying|. But it is called at a time different from the weak callback and might see different values of the watch target.
(In reply to JW Wang [:jwwang] from comment #9)
> > +class WatchTarget
> 
> Unlike MediaPromise, this class is not thread-safe. Need some comments to
> note the client code which is most likely our media code.

Good point. I also noticed that I'd accidentally given AbstractWatcher threadsafe refcounting, which I've fixed.
> I am worried we have 2 inconsistent ways to notify the subscribers of
> Watcher. One is weak callback and the other is via AbstractWatcher::Notify.
> For example, if Watcher::Notify is called multiple times in an event cycle
> (as far as AbstractThread is concerned), the weak callback will fire only
> once however AbstractWatcher::Notify will be called multiple times. I would
> prefer a notification model that is consistent with that of MediaPromise
> where resove/reject will notify it consumers in an async way.


Well, the assumption in the code is that dependent watchers are all themselves Watchers, which means that they'll do the same AddDirectTask dance to invoke any actual callbacks that get run. But I guess it might be useful someday to implement another kind of AbstractWatcher implementation, at which point this distinction becomes more relevant. I'll move the NotifyWatchers call into the tail-dispatched method.

> Doesn't NS_NewNonOwningRunnableMethod serve the same purpose?

Very good point!
Updated to address jww's feedback.
Attachment #8594405 - Attachment is obsolete: true
Attachment #8594405 - Flags: review?(cpearce)
Attachment #8594745 - Flags: review?(cpearce)
Comment on attachment 8594404 [details] [diff] [review]
Part 2 - Implement direct tasks. v1

Chris says he's happy to delegate this to jww.
Attachment #8594404 - Flags: review?(cpearce) → review?(jwwang)
Attachment #8594406 - Flags: review?(cpearce) → review?(jwwang)
Attachment #8594745 - Flags: review?(cpearce) → review?(jwwang)
Comment on attachment 8594404 [details] [diff] [review]
Part 2 - Implement direct tasks. v1

Review of attachment 8594404 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +1823,5 @@
> +    // "Direct" tail dispatcher are supposed to run immediately following the
> +    // execution of the current task. So the meta-tasking that we do here in
> +    // RunInStableState() breaks that abstraction a bit unless we handle it here.
> +    //
> +    // This is particularly important because we can end up with a "stream

I don't quite understand this. Can you elaborate it?

@@ +1828,5 @@
> +    // ended" notification immediately following a "stream available" notification,
> +    // and we need to make sure that the watcher responding to "stream available"
> +    // has a chance to run before the second notification starts tearing things
> +    // down.
> +    tailDispatcher.DrainDirectTasks();

Isn't this supposed to be called outside the loop?

::: dom/media/TaskDispatcher.h
@@ +196,5 @@
>      return nullptr;
>    }
>  
> +  // Direct tasks.
> +  std::queue<nsCOMPtr<nsIRunnable>> mDirectTasks;

Why not nsTArray?
Comment on attachment 8594745 [details] [diff] [review]
Part 3 - Implement state watching machinery. v2

Review of attachment 8594745 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/StateWatching.h
@@ +122,5 @@
> +protected:
> +  void NotifyWatchers()
> +  {
> +    WATCH_LOG("%s[%p] notifying watchers\n", mName, this);
> +    PruneWatchers();

I think we can be as lazy as only pruning watchers in NotifyWatchers() without doing that in {Add,Remove}Watcher.

@@ +138,5 @@
> +  {
> +    for (int i = 0; i < (int) mWatchers.Length(); ++i) {
> +      if (mWatchers[i]->IsDestroyed()) {
> +        mWatchers.RemoveElementAt(i);
> +        --i;

You can do reverse iteration to remove the elements with this outstanding decrement.
Attachment #8594745 - Flags: review?(jwwang) → review+
Attachment #8594406 - Flags: review?(jwwang) → review+
(In reply to JW Wang [:jwwang] from comment #14)
 
> I don't quite understand this. Can you elaborate it?

See below.
 
> Isn't this supposed to be called outside the loop?

No. The issue is that the stream graph code has a meta runnable that executes a series of callbacks (such as "a stream was created" and "a stream has finished"). The current architecture requires that we respond to each update before processing the next one. On trunk, HTMLMediaElement::NotifyMediaStreamTracksAvailable invokes UpdateReadyStateForData, which pushes us past HAVE_METADATA and gets the ball rolling.

When we hoist update processing into direct tasks on the tail dispatcher, we run into trouble, because both "stream available" and "stream ended" run before we get around to invoking UpdateReadyStateForData. This causes us to get stuck in HAVE_METADATA (because HasVideo is false again). The existing src stream architecture is a bit shallow, but I just want to keep the invariants as the code expects.

> > +  // Direct tasks.
> > +  std::queue<nsCOMPtr<nsIRunnable>> mDirectTasks;
> 
> Why not nsTArray?

Because nsTArray has pessimal performance when popping from the front. This is the same reason that MediaTaskQueue uses std::queue.
(In reply to Bobby Holley (:bholley) from comment #16)
> No. The issue is that the stream graph code has a meta runnable that
> executes a series of callbacks (such as "a stream was created" and "a stream
> has finished"). The current architecture requires that we respond to each
> update before processing the next one. On trunk,
> HTMLMediaElement::NotifyMediaStreamTracksAvailable invokes
> UpdateReadyStateForData, which pushes us past HAVE_METADATA and gets the
> ball rolling.
> 
> When we hoist update processing into direct tasks on the tail dispatcher, we
> run into trouble, because both "stream available" and "stream ended" run
> before we get around to invoking UpdateReadyStateForData. This causes us to
> get stuck in HAVE_METADATA (because HasVideo is false again). The existing
> src stream architecture is a bit shallow, but I just want to keep the
> invariants as the code expects.

I see. But it is still suspicious to drain all direct tasks in the 1st iteration of the loop. The direct listeners waiting for "stream ended" will miss it since they are all consumed in the 1st iteration right after "stream available" is processed, right?
(In reply to JW Wang [:jwwang] from comment #17)
> (In reply to Bobby Holley (:bholley) from comment #16)
> > No. The issue is that the stream graph code has a meta runnable that
> > executes a series of callbacks (such as "a stream was created" and "a stream
> > has finished"). The current architecture requires that we respond to each
> > update before processing the next one. On trunk,
> > HTMLMediaElement::NotifyMediaStreamTracksAvailable invokes
> > UpdateReadyStateForData, which pushes us past HAVE_METADATA and gets the
> > ball rolling.
> > 
> > When we hoist update processing into direct tasks on the tail dispatcher, we
> > run into trouble, because both "stream available" and "stream ended" run
> > before we get around to invoking UpdateReadyStateForData. This causes us to
> > get stuck in HAVE_METADATA (because HasVideo is false again). The existing
> > src stream architecture is a bit shallow, but I just want to keep the
> > invariants as the code expects.
> 
> I see. But it is still suspicious to drain all direct tasks in the 1st
> iteration of the loop.

It's not the first iteration, but rather _every_ iteration. Each iteration/sub-task has the potential to add direct tasks to the tail dispatcher. The idea here is to scope the execution of those direct tasks more tightly to the code that generated them.

> The direct listeners waiting for "stream ended"

I'm not sure what you mean by "direct listeners".

> will miss it

Miss what?

> since they are all consumed in the 1st iteration right after "stream
> available" is processed, right?

What is consumed? Direct tasks?

Basically, the point of direct tasks is that they're supposed to be run at the point we fuzzily define as "when the callstack unwinds". When HTMLMediaElement::NotifyMediaStreamTracksAvailable changes a watched variable, we want to recompute our ready state immediately after NotifyMediaStreamTracksAvailable returns, not after an arbitrary number of other tasks from MediaStreamGraphImpl::RunInStableState run, since those other tasks may change things out from under us.
Flags: needinfo?(jwwang)
I see. I didn't consider direct tasks could be generated during each iteration. For example, "stream available" will generate a direct task to recompute ready state as well as "stream ended".
Flags: needinfo?(jwwang)
Comment on attachment 8594404 [details] [diff] [review]
Part 2 - Implement direct tasks. v1

Review of attachment 8594404 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanation.
Attachment #8594404 - Flags: review?(jwwang) → review+
Thanks for the reviews! Our discussion on the above made me realize that this
is probably a good idea too.
Attachment #8595682 - Flags: review?(jwwang)
Attachment #8595682 - Flags: review?(jwwang) → review+
Blocks: 1157476
Depends on: 1157488
This was causing us to re-enter RunSyncSectionsInternal and therefore FireTailDispatcher.
Attachment #8596246 - Flags: review?(jwwang)
Comment on attachment 8596246 [details] [diff] [review]
Part 0 - Don't spin the event loop by synchronously dispatching when we're already on the main thread. v1

I guess it makes more sense to do this as part of bug 1157488, since it's necessary to make those assertions green.
Attachment #8596246 - Attachment is obsolete: true
Attachment #8596246 - Flags: review?(jwwang)
(In reply to Bobby Holley (:bholley) from comment #23)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cbd6305e508

Looks green. Double-checking with a full try push, then we can get this landed!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0e2e841a1e0
Depends on: 1208656
You need to log in before you can comment on or make changes to this bug.