When entering to dormant, mVideoRequestStatus is not cleared

RESOLVED FIXED in Firefox 36

Status

()

P1
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: sotaro, Assigned: cpearce)

Tracking

Trunk
mozilla38
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox35 unaffected, firefox36 fixed, firefox37 fixed, firefox38 fixed)

Details

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
When entering to dormant, mVideoRequestStatus is not cleared. It causes problem to video decoding after dormant exit.
(Reporter)

Updated

4 years ago
Assignee: nobody → sotaro.ikeda.g
Priority: P1 → --
(Reporter)

Updated

4 years ago
Blocks: 1050031
(Reporter)

Comment 1

4 years ago
Created attachment 8554105 [details] [diff] [review]
patch - Clear mVideoRequestStatus and mAudioRequestStatus
(Reporter)

Updated

4 years ago
Attachment #8554105 - Flags: review?(cpearce)
Comment on attachment 8554105 [details] [diff] [review]
patch - Clear mVideoRequestStatus and mAudioRequestStatus

Review of attachment 8554105 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +2763,5 @@
>        mPendingWakeDecoder = nullptr;
>        {
>          ReentrantMonitorAutoExit exitMon(mDecoder->GetReentrantMonitor());
>          // Wait for the thread decoding, if any, to exit.
>          DecodeTaskQueue()->AwaitIdle();

Doesn't DecodeTaskQueue()->AwaitIdle() guarantee all decoding tasks will be finished and m{Audio,Video}RequestStatus will be reset to idle?
(Assignee)

Comment 3

4 years ago
Created attachment 8554333 [details] [diff] [review]
Patch: Help increase chances of reproing failure

This patch makes it more likely to reproduce the stall.
(Assignee)

Comment 4

4 years ago
I think resetting the request status is treating the symptom, not the root cause.

I think what is happening, is that MediaDecoderStateMachine::EnsureAudioDecodeTaskQueued() (or EnsureVideoDecodeTaskQueued()) sets the video/audio request status to pending, and then dispatches a task to call DecodeAudio or DecodeVideo, and before we run that task, we switch to dormant mode, and execute RunStateMachine in the DORMANT case and call FlushDecoding(), which does a FlushAndDispatch() on the decode task queue, which drops the task to run DecodeVideo(), and so we never get out of pending state for the audio or video decode, and so playback can't continue after the seek. DecodeVideo() has a case that resets the request status if the state machine's state has changed.

I think the decode task that's being flushed is a decode task dispatched to preroll MediaData dispatched after the seek completes but before we've had a chance to switch to dormant state.

So the problem is that in FlushDecoding() we drop the task that would reset the request status field. 

I think what we need to do instead of this patch is change MediaDecoderStateMachine::FlushDecoding().

In FlushDecoding(), change:

    DecodeTaskQueue()->FlushAndDispatch(task);

to 
    DecodeTaskQueue()->Dispatch(task);
    DecodeTaskQueue()->AwaitIdle();

This means we won't drop the task that ensures we reset the decode status. It does however mean that the decoder may end up running more tasks that it did previously. This is probably OK, but it means our shutdown and seeking may not be quite so fast; but it should also mean that the Readers never have their tasks spontaneously disappear, which may in fact improve their reliability.

Bobby: Does changing the FlushAndDispatch to a Dispatch();AwaitIdle(); in MDSM:FlushDecoding() sound reasonable to you?

We should (in another bug!) perhaps remove MediaTaskQueue::FlushAndDispatch() for safety's sake; there's only one other callsite in the WMFMediaDataDecoder.
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 5

4 years ago
Comment on attachment 8554105 [details] [diff] [review]
patch - Clear mVideoRequestStatus and mAudioRequestStatus

Review of attachment 8554105 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should not use MediaTaskQueue::FlushAndDispatch(). We should Dispatch() and then AwaitIdle() in FlushDecoding() instead.

Ideally we'd make ResetDecode() return a promise. But that's a more complicated patch that we should do at a later time, in another bug.

::: dom/media/MediaDecoderStateMachine.cpp
@@ +2763,5 @@
>        mPendingWakeDecoder = nullptr;
>        {
>          ReentrantMonitorAutoExit exitMon(mDecoder->GetReentrantMonitor());
>          // Wait for the thread decoding, if any, to exit.
>          DecodeTaskQueue()->AwaitIdle();

jwwang: There's the FlushDecoding() call above that flushes the decode task queue which prevents the task that would reset the request status from running.
Attachment #8554105 - Flags: review?(cpearce) → review-
Comment on attachment 8554105 [details] [diff] [review]
patch - Clear mVideoRequestStatus and mAudioRequestStatus

Review of attachment 8554105 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +2763,5 @@
>        mPendingWakeDecoder = nullptr;
>        {
>          ReentrantMonitorAutoExit exitMon(mDecoder->GetReentrantMonitor());
>          // Wait for the thread decoding, if any, to exit.
>          DecodeTaskQueue()->AwaitIdle();

Does this code predate the MediaPromise? Bug 1123983 indicates media promises will always be resolved or rejected during FlushDecoding().
(Assignee)

Comment 7

4 years ago
Yes, though it was modified recently in bug 1123983.

But also note that MediaPromises' tasks to dispatch reject/resolves are immune to flushing. That is, a MediaTaskQueue::Flush() won't flush a MediaPromise's resolve/reject tasks, they still run. However in this case the task that's responsible for *calling* RequestVideoData (a DecodeVideo() function call task) is a normal task and it *is* being Flushed. Because that task is flushed, it doesn't reset the request status fields, and they remain in pending state.
(In reply to Chris Pearce (:cpearce) from comment #4)
> Bobby: Does changing the FlushAndDispatch to a Dispatch();AwaitIdle(); in
> MDSM:FlushDecoding() sound reasonable to you?

Absolutely. The whole "throw away tasks" thing always creeped me out, which is why I engineered promises to override that behavior. Getting rid of it entirely (in favor of more explicitly cancelable things in places where it's needed) seems much better.
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 9

4 years ago
OK, let's see if anything breaks then:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=efa4b41d6a3c
(Reporter)

Comment 10

4 years ago
(In reply to Chris Pearce (:cpearce) from comment #4)
> I think what we need to do instead of this patch is change
> MediaDecoderStateMachine::FlushDecoding().
> 
> In FlushDecoding(), change:
> 
>     DecodeTaskQueue()->FlushAndDispatch(task);
> 
> to 
>     DecodeTaskQueue()->Dispatch(task);
>     DecodeTaskQueue()->AwaitIdle();

cpearce, your change worked for me. Can you take this bug?
Flags: needinfo?(cpearce)
(Assignee)

Comment 11

4 years ago
Yes, I'll take this bug.
Assignee: sotaro.ikeda.g → cpearce
Flags: needinfo?(cpearce)
(Assignee)

Comment 12

4 years ago
Created attachment 8554717 [details] [diff] [review]
Patch

Don't flush the decode task queue in MDSM::FlushDecoding().
Attachment #8554105 - Attachment is obsolete: true
Attachment #8554717 - Flags: review?(bobbyholley)
Comment on attachment 8554717 [details] [diff] [review]
Patch

Review of attachment 8554717 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with a followup filed to get rid of the flushing machinery in MediaTaskQueue.
Attachment #8554717 - Flags: review?(bobbyholley) → review+
(Assignee)

Comment 14

4 years ago
I thought you might say that! Bug 1125970.
sorry had to back this out in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=043a03d002f2 seems one of this changes caused a Crashtest bustage on Windows Tests like:

https://treeherder.mozilla.org/logviewer.html#?job_id=5955531&repo=mozilla-inbound
Flags: needinfo?(cpearce)

Updated

4 years ago
Priority: -- → P1
(Reporter)

Comment 19

4 years ago
It might be better to check-in this bug's patch separately from Bug 1123535.
(Assignee)

Comment 21

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #19)
> It might be better to check-in this bug's patch separately from Bug 1123535.

Yes. I will land if the try push is green. I will try to debug the failures tomorrow.
(Assignee)

Comment 23

4 years ago
Ralph: will need uplift here to support dormant mode for memory reduction strategy.
Flags: needinfo?(giles)
Looks like this was the one responsible for the mochitest failures noted in comment 18. Backed out again.
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ed122041670
(Assignee)

Comment 25

4 years ago
I don't see any failures on m-i, and there weren't any in the try run. Can you link them here?
Flags: needinfo?(ryanvm)
(Assignee)

Comment 27

4 years ago
With Sotaro's original patch, this is green, except for a web-platform test that is already known to be bad gets worse:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c412adf8db4d
(Assignee)

Comment 28

4 years ago
Created attachment 8556248 [details] [diff] [review]
Patch

I think what the problem with the Linux boxes, is that they still use the synchronous decode adapter. In MediaDecoderReader::RequestVideoData if the DecodeVideoFrame() call returns false (say because we shutdown the MediaResource) and we happen to have been skipping to keyframe, we'll dispatch a task to call RequestVideoData again. Since we're no longer flushing the decode task queue, this means we could never have an idle task queue since when the task is running it will always fail in the DecodeVideoFrame call and then dispatch another task to try again. So we'd time out.

This patch is based on top of my previous patch.
Attachment #8556248 - Flags: review?(matt.woodrow)
Comment on attachment 8556248 [details] [diff] [review]
Patch

Review of attachment 8556248 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderReader.cpp
@@ +189,5 @@
>      mReader->RequestVideoData(skip, mTimeThreshold);
>      return NS_OK;
>    }
> +
> +  void Cancel() {

Can we assert that this is only every called from the task queue? The MOZ_ASSERT above could fail if it isn't.

@@ +276,5 @@
>  }
>  
> +nsresult MediaDecoderReader::ResetDecode()
> +{
> +  nsresult res = NS_OK;

I know you just moved this, but why is this a local var? We can just return NS_OK.
Attachment #8556248 - Flags: review?(matt.woodrow) → review+
(Assignee)

Comment 31

4 years ago
Yup, will assert in RequestVideoWithSkipTask::Cancel() (and Run() too) that we're on the task queue, and remove the local. Thanks!
(Assignee)

Comment 32

4 years ago
I am not convinced that try push is green enough, so I push Sotaro's patch instead and follow up.

Sotaro's patch Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9888dba8f3bc
https://hg.mozilla.org/mozilla-central/rev/a607fdc02c5d
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Created attachment 8556572 [details] [diff] [review]
Reset video request status on reset decode.

Sotaro's patch as landed. Requesting uplift of just this change.

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Youtube playback uses more resources.
[Describe test coverage new/current, TreeHerder]: Landed on m-c.
[Risks and why]: Low. This does affect non-MSE playback but is a small and isolated change.
[String/UUID change made/needed]: None.
Flags: needinfo?(giles)
Attachment #8556572 - Flags: approval-mozilla-beta?
Attachment #8556572 - Flags: approval-mozilla-aurora?
Attachment #8556572 - Flags: approval-mozilla-beta?
Attachment #8556572 - Flags: approval-mozilla-beta+
Attachment #8556572 - Flags: approval-mozilla-aurora?
Attachment #8556572 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/2822ac22aae1
status-firefox35: --- → unaffected
status-firefox36: --- → affected
status-firefox37: --- → fixed
status-firefox38: --- → fixed
You need to log in before you can comment on or make changes to this bug.