Closed Bug 962719 Opened 6 years ago Closed 4 years ago

ASSERTION: Shouldn't have already notified of finish *and* have output!: '!streamHasOutput[i] || !stream->mNotifiedFinished'

Categories

(Core :: WebRTC: Audio/Video, defect, P4)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed
Blocking Flags:

People

(Reporter: florian, Assigned: karlt)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(12 files, 5 obsolete files)

28.76 KB, patch
padenot
: review+
Details | Diff | Splinter Review
4.05 KB, patch
padenot
: review+
Details | Diff | Splinter Review
2.82 KB, patch
padenot
: review+
Details | Diff | Splinter Review
3.10 KB, patch
padenot
: review+
Details | Diff | Splinter Review
2.17 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.03 KB, patch
roc
: review+
Details | Diff | Splinter Review
14.11 KB, patch
padenot
: review+
Details | Diff | Splinter Review
1.33 KB, patch
padenot
: review+
Details | Diff | Splinter Review
3.53 KB, patch
padenot
: review+
Details | Diff | Splinter Review
2.37 KB, patch
roc
: review+
Details | Diff | Splinter Review
3.47 KB, patch
roc
: review+
Details | Diff | Splinter Review
10.96 KB, patch
roc
: review+
Details | Diff | Splinter Review
Very similar to bug 957832, but different steps to reproduce.

Whenever I click the 'Stop Sharing' item I'm adding in bug 885796, I get this assertion twice in my terminal:

###!!! ASSERTION: Shouldn't have already notified of finish *and* have output!: '!streamHasOutput[i] || !stream->mNotifiedFinished', file srcdir/content/media/MediaStreamGraph.cpp, line 398
mozilla::MediaStreamGraphImpl::UpdateCurrentTime()+0x000005bc [NightlyDebug.app/Contents/MacOS/XUL +0x026b724c]
mozilla::MediaStreamGraphImpl::RunThread()+0x0000012c [NightlyDebug.app/Contents/MacOS/XUL +0x026ba46c]
mozilla::(anonymous namespace)::MediaStreamGraphInitThreadRunnable::Run()+0x00000041 [NightlyDebug.app/Contents/MacOS/XUL +0x026c87b1]
nsThread::ProcessNextEvent(bool, bool*)+0x0000069a [NightlyDebug.app/Contents/MacOS/XUL +0x001225fa]
NS_ProcessNextEvent(nsIThread*, bool)+0x000000a8 [NightlyDebug.app/Contents/MacOS/XUL +0x00023068]
mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)+0x00000278 [NightlyDebug.app/Contents/MacOS/XUL +0x006d9618]
MessageLoop::RunInternal()+0x00000076 [NightlyDebug.app/Contents/MacOS/XUL +0x0065a226]
MessageLoop::RunHandler()+0x00000015 [NightlyDebug.app/Contents/MacOS/XUL +0x0065a135]
MessageLoop::Run()+0x0000002d [NightlyDebug.app/Contents/MacOS/XUL +0x0065a0dd]
nsThread::ThreadFunc(void*)+0x00000157 [NightlyDebug.app/Contents/MacOS/XUL +0x00120bf7]
_pt_root+0x00000165 [NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x0023b9b5]
_pthread_start+0x00000147 [/usr/lib/system/libsystem_c.dylib +0x00014742]

While working on tests for the WebRTC UI in bug 804611, I also noticed that when this assertion happens, the recording-device-events notification is fired twice (I expect it once) for no apparent reason; I'm assuming this is related.

Unless I get reviews quickly, to reproduce this you'll need to apply first the patch from bug 958071 and then the patch from bug 885796.
Blocks: 804611
Bug 958071, bug 885796 and bug 804611 have all landed now. The tests added in bug 804611 contain workarounds for this bug.

I was hoping to at least have someone comment here to say if this assertion looks serious, or like something that we can continue to ignore for a while.
roc: see comment 1 "I was hoping to at least have someone comment here to say if this assertion looks serious, or like something that we can continue to ignore for a while." (given that the bugfixes have landed)
Flags: needinfo?(roc)
The assertion was turned into a warning in bug 957359, so the tests should not require warnings.  In fact it would be easier to debug if the tests didn't contain workarounds, so I can re-enable the assertion and notice if anything fails.
(In reply to Karl Tomlinson (:karlt) from comment #3)

> In fact it would be easier to debug if the tests
> didn't contain workarounds,

The workaround is for the recording-device-events notification that's received twice when it's expected only once, as explained in comment 0.

> so I can re-enable the assertion and notice if anything fails.

In comment 0 I said "Whenever I click the 'Stop Sharing' item I'm adding in bug 885796, I get this assertion twice in my terminal". Any reason why this way to reproduce isn't usable to debug?
(In reply to Florian Quèze [:florian] [:flo] from comment #4)

> In comment 0 I said "Whenever I click the 'Stop Sharing' item I'm adding in
> bug 885796, I get this assertion twice in my terminal". Any reason why this
> way to reproduce isn't usable to debug?

I've just verified that these steps to reproduce still work with an up to date debug build:
1. Load https://apprtc.webrtc.org/ in a tab.
2. Click "Share selected devices" and wait for the image of your webcam to appear
3. Click the green camera icon near the URL bar.
4. Click on the down arrow near 'Continue Sharing'
5. Click 'Stop Sharing'

At this point, this warning appears twice in my terminal:
[Parent 28176] WARNING: Shouldn't have already notified of finish *and* have output!: '!streamHasOutput[i] || !stream->mNotifiedFinished', file /Users/florian/buildhg/mozilla-central/content/media/MediaStreamGraph.cpp, line 407


If you want an _automated_ way to trigger the warning/assertion, you can edit http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/get_user_media.html?force=1#15 and remove the "fake: true", then run:
./mach mochitest-browser browser/base/content/test/general/browser_get_user_media.js
Thanks for the STR.

This try run re-enabled the assertion and ran the test added in bug 804611.
https://tbpl.mozilla.org/php/getParsedLog.php?id=35511324&tree=Try
It reported
> TEST-KNOWN-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_get_user_media.js | Got the 'recording-device-events' notification twice, likely because of bug 962719

but I don't see this assertion failure in the output.
(In reply to Karl Tomlinson (:karlt) from comment #6)
> Thanks for the STR.
> 
> This try run re-enabled the assertion and ran the test added in bug 804611. [...] I don't see this assertion failure in the output.

See the last paragraph of comment 5. You NEED to remove "fake: true" from the test case for the assertion to be triggered. And you need to run the test locally; AFAIK the test slaves don't have webcams attached (hence the use of fake streams for the test).
If the recording-device-events is unexpectedly fired twice, even when the assertion does not fail, doesn't that imply that the extra notification is unrelated to this assertion failure bug?
(In reply to Karl Tomlinson (:karlt) from comment #8)
> If the recording-device-events is unexpectedly fired twice, even when the
> assertion does not fail, doesn't that imply that the extra notification is
> unrelated to this assertion failure bug?

No.

My assumption is that there's a bug causing some code to be called twice when it shouldn't, and that the code being called twice triggers the assertion for real streams. Fake streams hide the assertion.

After debugging you may conclude that this assumption wasn't correct; but without prior knowledge of this specific piece of code, this assumption seemed reasonable to me.
mStateComputedTime is about 20 - 30 ms ahead of mCurrentTime, as expected.

RecomputeBlocking() uses mStateComputedTime and stream->mFinished to set
stream->mBlocked.

UpdateCurrentTime() sets mNotifiedFinished in the first iteration after
mFinished and stream->mBlocked have been set.
In that iteration and the next, UpdateCurrentTime() uses values in mBlocked
for 10 ms intervals from mCurrentTime to calculate what length of time in that
period the stream was blocked.
The stream is not detected blocked during that period because mCurrentTime has
not yet reached the value mStateComputedTime had when mBlocked was set.

The assertion failing indicates that the finished notification (on the graph
thread) is sent before the last output notifications.

I would like to change the output notifications to be consistent with the
period of stream processing, which is based around mStateComputedTime, not
mCurrentTime.
Assignee: nobody → karlt
Blocks: 956574
No longer blocks: 957832
Duplicate of this bug: 957832
Blocks: 957359
Blocks: 967606
No longer blocks: 967606
See Also: → 1053011
Still relevant?
backlog: --- → webRTC+
Rank: 45
Flags: needinfo?(karlt)
Priority: -- → P4
Yes.  Re-enabling the assertion shows this is still failing.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=021f4e756537

The cause is also involved in bug 956574 and bug 1053011.
Flags: needinfo?(karlt)
I saw this assertion failing intermittently on linux
dom/media/webspeech/synth/test/test_speech_cancel.html at
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa93c908bf4e
Attachment #8642963 - Flags: review?(padenot)
Attachment #8642965 - Flags: review?(padenot)
The last track comment comes from
https://hg.mozilla.org/mozilla-central/rev/86b2cd5958f3
Attachment #8642967 - Flags: review?(roc)
Attachment #8642968 - Flags: review?(roc)
This does not affect the target time of the video frames, but may mean that more
frames are displayed.

Covering from mProcessedTime to mStateComputedTime ensures that no chunks are
missed.
Attachment #8642971 - Flags: review?(roc)
This provides HAVE_CURRENT_DATA readyState.

The "waiting" event is only for when "Playback has stopped because the next frame is not available, but the user agent expects that frame to become available in due course."

"If the previous ready state was HAVE_FUTURE_DATA or more, and the new ready state is HAVE_CURRENT_DATA or less

    If the media element was potentially playing before its readyState attribute changed to a value lower than HAVE_FUTURE_DATA, and the element has not ended playback, and playback has not stopped due to errors, paused for user interaction, or paused for in-band content, the user agent must queue a task to fire a simple event named timeupdate at the element, and queue a task to fire a simple event named waiting at the element."

Currently NotifiyBlockingChanged() is usually called before
NotifyEvent(EVENT_FINISHED) which is not what HTMLMediaElement wants because
that incorrectly generates a "waiting" event before "ended".

However, in unusual circumstances, and more frequently with the changes here
to make main thread state consistent with processing boundaries,
EVENT_FINISHED can be delivered before NotifyBlockingChanged().  Finished can
be determined immediately after mFinished is set, but blocked is not
determined until RecomputeBlocking() has run.

In that situation, we still need UpdateReadyStateInternal() to run and want
NextFrameStatus() to return NEXT_FRAME_UNAVAILABLE.
Attachment #8642973 - Flags: review?(roc)
Notifications are now up to date with processing, and
MediaStream::GetCurrentTime() now returns "the main-thread's view of how much
data has been processed by this stream", as documented.
Attachment #8642974 - Flags: review?(padenot)
Finished can be determined immediately after mFinished is set, but blocked is
not determined until RecomputeBlocking() has run.
Attachment #8642975 - Flags: review?(padenot)
browser_get_user_media.js was renamed to browser_devices_get_user_media.js in 1d0e1126f1da.  It seems that fixing the cause of assertions here does not resolve the cause of timeout in those tests when removing fake: true from the getUserMedia options.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cce95815cd4
(In reply to Karl Tomlinson (ni?:karlt) from comment #25)
> browser_get_user_media.js was renamed to browser_devices_get_user_media.js
> in 1d0e1126f1da.  It seems that fixing the cause of assertions here does not
> resolve the cause of timeout in those tests when removing fake: true from
> the getUserMedia options.

As I noted in comment 7, removing fake: true will only help reproducing locally. Using fake devices is required for the test infrastructure.
Comment on attachment 8642971 [details] [diff] [review]
provide video frames to the container as soon as available

Review of attachment 8642971 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +1156,5 @@
> +    if (!chunk)
> +      return;
> +
> +    if(chunk->mFrame != aStream->mLastPlayedVideoFrame) {
> +      OutputVideoFrame(aStream, &chunk->mFrame, start);

I'm confused. Why are we outputting multiple frames here? All but the last one will be ignored, I think.
Attachment #8642971 - Flags: review?(roc) → review-
Attachment #8642962 - Flags: review?(padenot) → review+
Attachment #8642963 - Flags: review?(padenot) → review+
Attachment #8642965 - Flags: review?(padenot) → review+
Attachment #8642974 - Flags: review?(padenot) → review+
Attachment #8642975 - Flags: review?(padenot) → review+
Attachment #8642976 - Flags: review?(padenot) → review+
Attachment #8642961 - Flags: review?(padenot) → review+
Attachment #8642967 - Attachment is obsolete: true
Attachment #8642971 - Attachment is obsolete: true
This avoids needing to find another place to count frame IDs.
Attachment #8644737 - Flags: review?(roc)
Attachment #8644738 - Flags: review?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> I'm confused. Why are we outputting multiple frames here?

The goal is to output all frames available so that they can be composited
according to TimeStamp targetTime (and if they are not composited, then frame
dropped counts are available).

> All but the last one will be ignored, I think.

Ah, OK.  SetCurrentImages() only hands off previous images to the ImageClient
immediately if called InImageBridgeChildThread().  Otherwise, as when called
on the thread here, the GetCurrentImages() call happens on the
ImageBridgeChild thread after SetCurrentImageInternal() and races with
subsequent SetCurrentImages() calls.

https://dxr.mozilla.org/mozilla-central/rev/5cf4d2f7f2f2b3df2f1edd31b8bdce7882f3875c/gfx/layers/ImageContainer.cpp#312
https://dxr.mozilla.org/mozilla-central/rev/5cf4d2f7f2f2b3df2f1edd31b8bdce7882f3875c/gfx/layers/ipc/ImageBridgeChild.cpp#420-429

Keeping to a single SetCurrentImages() call per iteration should usually
give the image bridge child time to collect.
Attachment #8644740 - Flags: review?(roc)
I guess overflow will usually take at least 200 days, but AFAIK there is no
disadvantage to using unsigned.
Attachment #8644741 - Flags: review?(roc)
Comment on attachment 8644740 [details] [diff] [review]
provide video frames to the container as soon as available

Review of attachment 8644740 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +1169,5 @@
> +    }
> +    images.AppendElement(ImageContainer::NonOwningImage(image, targetTime));
> +
> +    if (!aStream->mNotifiedFinished) {
> +      aStream->mLastPlayedVideoFrame = *frame;

The meaning of mLastPlayedVideoFrame is unclear now. Here we'll be setting it to the last frame in the queue, but up above we'll skip adding it to the queue, which seems wrong.

@@ +1180,5 @@
>    for (uint32_t i = 0; i < aStream->mVideoOutputs.Length(); ++i) {
>      VideoFrameContainer* output = aStream->mVideoOutputs[i];
>  
> +    for (auto& image : images) {
> +      image.mFrameID = output->NewFrameID();

Setting new frame IDs for every image in the list isn't really right. The same image can appear in multiple SetCurrentFrames calls over time, and each time it should have the same frame ID. Maybe that's a bit hard to do though... setting a new frame ID should be safe but might mean useless work.
Attachment #8644740 - Flags: review?(roc) → review-
Comment on attachment 8644738 [details] [diff] [review]
use VideoFrameContainer::NewFrameID()

I mis-assumed that RenderVideoFrames() dispatched frames only once.
Given it can dispatch the same frame repeatedly, we don't want a different frame id each time.
Attachment #8644738 - Attachment is obsolete: true
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #34)
> > +
> > +    if (!aStream->mNotifiedFinished) {
> > +      aStream->mLastPlayedVideoFrame = *frame;
> 
> The meaning of mLastPlayedVideoFrame is unclear now. Here we'll be setting
> it to the last frame in the queue, but up above we'll skip adding it to the
> queue, which seems wrong.

I was mis-assuming that we only sent frames to the compositor once, and that
the compositor would choose from all those that have been sent, and so this
existing logic was re-used to avoid sending the same frame again.

That assumption was not correct, but this logic is still useful to avoid
assigning a new frame id to the same frame.

> Setting new frame IDs for every image in the list isn't really right. The
> same image can appear in multiple SetCurrentFrames calls over time, and each
> time it should have the same frame ID. Maybe that's a bit hard to do
> though... setting a new frame ID should be safe but might mean useless work.

Each PlayVideo() call processes an adjacent but distinct interval so the
mLastPlayedVideoFrame logic provides that the same frame is not added again
with what would be a new frame id.

With the other patches here, the notifications, time updates, and processing
all use the same time interval.  This interval is a little in advance of the
current time so that the frames are ready in time, but now that all intervals
align, we can't examine stream times or data prior to the interval and so
can't go back as far as the current time.  Trying to keep old stream data with
up-to-date notifications by using a different time interval for
AdvanceTimeVaryingValuesToCurrentTime() would require analysing blocking over
two different intervals, one for notifications and one for
AdvanceTimeVaryingValuesToCurrentTime(), which I'd like to avoid.

However the frames that we need to keep from the previous iteration are still
on the VideoFrameContainer, so this patch collects them from there.
Attachment #8646167 - Flags: review?(roc)
Attachment #8644740 - Attachment is obsolete: true
Comment on attachment 8646167 [details] [diff] [review]
provide video frames to the container as soon as
available v3

Review of attachment 8646167 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +1190,5 @@
> +    output->GetImageContainer()->GetCurrentImages(&previousImages);
> +    uint32_t j = previousImages.Length();
> +    for ( ; j--; ) {
> +      if (previousImages[j].mTimeStamp < currentTimeStamp)
> +        break;

If previousImage[j].mTimeStamp < currentTimeStamp the frame could still be relevant, if it's the most recent frame before currentTimeStamp and the first timestamp of the frames in the graph is > currentTimeStamp.

Right?
Attachment #8646167 - Attachment description: provide video frames to the container as soon as available v3 → provide video frames to the container as soon as available v3
Attachment #8646167 - Flags: review?(roc) → review-
Depends on: 1194558
dom/canvas/test/test_capture.html noticed that multiple images are left in the
container when no more frames are generated.

Although I assume the test could be made to pass by returning the last image
from AutoLockImage::GetImage(), it seems to be the responsibility of the
producer to remove expired images, so I've added support for that.

This version also avoids overlapping timestamps should there be large time
skew.  It takes the simple approach of dropping frames.  I don't know whether
it would be worth scheduling the frames anyway with the next frame's timestamp
to support counting frame dropping.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #37)
> > +    uint32_t j = previousImages.Length();
> > +    for ( ; j--; ) {
> > +      if (previousImages[j].mTimeStamp < currentTimeStamp)
> > +        break;
> 
> If previousImage[j].mTimeStamp < currentTimeStamp the frame could still be
> relevant, if it's the most recent frame before currentTimeStamp and the
> first timestamp of the frames in the graph is > currentTimeStamp.
> 
> Right?

Correct.  The code was designed to keep the most recent frame before
currentTimeStamp.  I've added a comment to indicate that, and fixed the
loop to leave j at 0 when all frames have timestamp > currentTimeStamp;

The previous version of the patch used the start time of each chunk for
newImages.  If there were multiple tracks, that start time may have been
before currentTimeStamp and so keeping the most recent old frame may have
sometimes been unnecessary.  However, that approach for start time could
result in timestamps out of order with multiple tracks.  This version fixes
that.  If the last track finishes, then it's last frame should be played until
the point it finishes instead of replacing it earlier when a frame from
another track starts earlier.  Timestamps are now always > currentTimeStamp.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3791153a14ce
Attachment #8647852 - Flags: review?(roc)
Attachment #8646167 - Attachment is obsolete: true
Attachment #8647852 - Attachment description: provide video frames to the container as soon as available v4 → provide video frames to the container as soon as available v4
Attachment #8647852 - Flags: review?(roc) → review+
With the workaround for bug 1181006, attachment 8386517 [details] notices a 20 ms reduction in Web Audio currentTime delay.

It should be simple to test that currentTime aligns with 128 sample blocks, but currently that is only true when there is more than one node due to ExtraCurrentTime(), as tracked in bug 1053011.
Blocks: 1053011
Flags: needinfo?(roc) → in-testsuite?
Keywords: perf
You need to log in before you can comment on or make changes to this bug.