Closed Bug 1536766 Opened 5 years ago Closed 5 years ago

No video ended event fired when video file has a shorter audio track

Categories

(Core :: Audio/Video: Playback, defect, P2)

65 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: luca, Assigned: pehrsons)

References

(Regression)

Details

(Keywords: regression)

Attachments

(15 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36

Steps to reproduce:

Our application waits for certain videos to fire the "ended" event, when a video has played until its end.

Actual results:

Since Firefox version 65, the ended event will not occur, if the videos audio track is shorter than the video itself. Infact firefox doesn't seem to play the video to its end at all.

we discovered this with this video:
http://ostsee-life.nabu.de/scenes/S_100/360/4096x2048/S_100_luftaufnahme_4096x2048_0_0.mp4

and also its .webm equivalent.

the audio seems to be about 700ms shorter.

ff version 61 to 64 work just fine. 65 and 66 are stopping the playback too early.

Expected results:

The video ended event should be fired

the previously linked video has been updated. the now faulty file has been moved to http://ostsee-life.nabu.de/scenes/S_100/360/4096x2048/bad/S_100_luftaufnahme_4096x2048_0_0.mp4

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core

:alwu, sounds like a regression following your changes with audio looping.

Flags: needinfo?(alwu)
Assignee: nobody → alwu
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Hi, Luca,
I can't reproduce the issue with latest Nightly 68.0a1 (2019-03-20) on OSX 10.12.6.
The ended event was dispatched correctly and I used the video link you provided in comment1.

Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
Flags: needinfo?(alwu) → needinfo?(luca)

You can check by this url

BTW, in Firefox Release 65.0.1, the ended event is fired correctly as well.

Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: alwu → nobody
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false

Thanks for your example. You are right, the problem seems to be a little more specific.

In our engine, the video-element is created in javascript and is not added to the dom. We use it to render it into a WebGL-Context in order to create a 360°-View. We also use the WebAudioApi and "createMediaElementSource()" to route the videos Audio through our sound engine.

I have extended your Demo by creating the video node in javascript AND routing the sound into Webaudio (it seems to only occur in this combination). I then won't receive a "ended" event on firefox-versions >= 65 with this demo, i do receive the event with versions =< 64

Here is the link to my demo (feel free to download the html and linked video, if you need to):
http://luca.kfcloud.de/firefoxVideoEndedDemo/demo.html

PS: In the demo i skip towards the end of the video, so you don't have to listen to the whole thing.

PPS: I hosted the video separatly, because hooking it into web-audio would produce cross origin-problems otherwise

Flags: needinfo?(luca)

PPS: Example wont work in Browsers that require user interaction to start webaudio contexts (like chrome). It also works in Microsofts Edge for exampe ;)

(In reply to Luca from comment #6)

PS: In the demo i skip towards the end of the video, so you don't have to listen to the whole thing.

PPS: I hosted the video separatly, because hooking it into web-audio would produce cross origin-problems otherwise

This will not work in either chrome or FF version 67 and later due to block autoplay.

You call videoElement.play(); which will prevent play() from working.

You need to test if the promise is rejected.

(In reply to Luca from comment #7)

PPS: Example wont work in Browsers that require user interaction to start webaudio contexts (like chrome). It also works in Microsofts Edge for exampe ;)

A work-around would be to mute the video element.

The result of the preliminary investigation is that, the video sink stopped consuming video data from video queue after audio reached to the end, which caused no more video decoding task to be dispatched and the video decoding couldn't reach to the end.

I'm still investigating why we stop comsuming the video data and why didn't dispatch any decoding task after that.

This issue looks not related with audio looping, it seems more related with using decoded stream.

Assignee: nobody → alwu
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

The reason that we didn't dispatch the ended event is because video sink stopped consuming data, so that we couldn't decode video to the end, which causes decoding was never ended.

And the reason that video sink stopped consuming data is because the clock time [1] stopped increasing after audio reached to the end. The clock time was gotten from [2], which showed that the mLastOutputTime didn't be updated. That is, after audio reahced to the end, MSG seems stopping calling NotifyOutput() [3] which would use media source event to call DecodedStream::NotifyOutput() [4], which would update mLastOutputTime.

[1] https://searchfox.org/mozilla-central/rev/486b64e4bff86b7988af8c8b80845404ad197533/dom/media/mediasink/VideoSink.cpp#516
[2] https://searchfox.org/mozilla-central/rev/486b64e4bff86b7988af8c8b80845404ad197533/dom/media/mediasink/DecodedStream.cpp#751
[3] https://searchfox.org/mozilla-central/rev/486b64e4bff86b7988af8c8b80845404ad197533/dom/media/mediasink/DecodedStream.cpp#92
[4] https://searchfox.org/mozilla-central/rev/486b64e4bff86b7988af8c8b80845404ad197533/dom/media/mediasink/DecodedStream.cpp#417


Hi, Paul,
Could you help me check the MSG part to see why the NotifyOutput() didn't be called after audio reached to the end?
Thank you!

Flags: needinfo?(padenot)

Hmm, wait, we only captured the audio stream, so it makes sense to not call NotofyOutput() after there is no audio.

Flags: needinfo?(padenot)

Now I have patches to fix this issue, I'm planing to use AudioSinkWrapper to manager both AudioSink and DecodedStream, which should return correct clock time during different situations. I'm testing my patches and will request review later.

:alwu, any ideas which changes caused the regression?

I'm thinking some of the MediaStream/MSG changes caused it, :phersons?

Just asking as it would ensure the ideal fix.

Flags: needinfo?(apehrson)

I guess that this issue has existed for a while since we introduced the DecodedStream. Because DecodedStream doesn't consider the case to return increasing clock time after audio ended when audio is shorter than video.

Maybe this check [1] needs to account for an ended audio track and thus switch to video? It was part of bug 1423241 and the shift from a stream listener to track listeners, and landed in 65.

Like Alastor said we're only capturing audio here, but I don't think that makes a difference to MediaDecoder or DecodedStream. They will always process both tracks if present.

[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/DecodedStream.cpp#93-97

Flags: needinfo?(apehrson)

We could also pad the audio track with silence, and delay the ending of the audio track in the MSG, until the video track has also ended. This could have side effects if there's a second, still live, audio track and both are fed to web audio through a MediaStreamAudioSourceNode. I'm not sure right now which approach would be easier.

We've already had a place to handle this kinds of case [1], so I think it would be good to let AudioSinkWrapper to manager DecodedStream as well.
[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

Has STR: --- → yes

(In reply to Alastor Wu [:alwu] from comment #18)

We've already had a place to handle this kinds of case [1], so I think it would be good to let AudioSinkWrapper to manager DecodedStream as well.
[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

As DecodedStream goes as an AudioSink (as far as VideoSink is concerned it's on the same level as AudioSinkWrapper) it feels natural that it only cares about reporting audio position. But it also processes video, sending video frames to a video track, so somehow it too needs access to the progressing video clock after audio has finished in order to send out the last video frames. Does your solution accommodate that?

(In reply to Andreas Pehrson [:pehrsons] from comment #19)

(In reply to Alastor Wu [:alwu] from comment #18)

We've already had a place to handle this kinds of case [1], so I think it would be good to let AudioSinkWrapper to manager DecodedStream as well.
[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

As DecodedStream goes as an AudioSink (as far as VideoSink is concerned it's on the same level as AudioSinkWrapper) it feels natural that it only cares about reporting audio position.

I agree that it feels convoluted that an AudioSink would need to know about the video position. That is not the job of an AudioSink.
And while that approach may/will work, I don't think it should be the preferred one for the sake of expediency.

(In reply to Andreas Pehrson [:pehrsons] from comment #19)

As DecodedStream goes as an AudioSink (as far as VideoSink is concerned it's on the same level as AudioSinkWrapper) it feels natural that it only cares about reporting audio position. But it also processes video, sending video frames to a video track, so somehow it too needs access to the progressing video clock after audio has finished in order to send out the last video frames. Does your solution accommodate that?

That is what AudioSinkWrapper did now, it would calculate position depending on the playback rate when audio has ended, which could make video sink get correct clock time to consume video frame.

What I would like to do is not to change any functionality inside the DecodedStream, to make a wrapper to handle all other audio position non-related stuffs.

(In reply to Jean-Yves Avenard [:jya] from comment #20)

I agree that it feels convoluted that an AudioSink would need to know about the video position. That is not the job of an AudioSink.
And while that approach may/will work, I don't think it should be the preferred one for the sake of expediency.

It seems to me that we should move the check [1] to VideoSink and to make VideoSink switching to calculate the position after audio ended. Then AudioSinkWrapper could be used to handle some difference between DecodedStream and AudioSink.

WDYT?

[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

I'm thinking about the following concept.

AudioSinkWrapper is responsible to return audio clock time for media with or without audio (using playback rate to calculate the clock)

So for the level of the audio sink, it actually doesn't need to know about video position, the function [1] here should be renamed to GetSystemClock() or something similar.

VideoSink contains AudioSinkWrapper, it could always rely the clock time AudioSinkWrapper returns. AudioSink and DecodedStream are two dedicated sink to consume audio data, so we can let AudioSinkWrapper to manage them, so that it could know when we should switch to system clock in order to return the correct time.

[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

If we capture audio for a media element without audio track, what would happen? Would we append silence to the track?

Because AudioSinkWrapper won't create AudioSink when media doesn't have audio track, I'm thinking about whether we have to create DecodedStream in this case if we let AudioSinkWrapper to manage DecodedStream.

Flags: needinfo?(apehrson)

hmm, DecodedStream would also be responsible to return video ended promise [1]. Now I'm not sure that it's a good idea to AudioSink and DecodedStream...

[1] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/dom/media/mediasink/DecodedStream.cpp#336

(In reply to Alastor Wu [:alwu] from comment #23)

If we capture audio for a media element without audio track, what would happen? Would we append silence to the track?

Because AudioSinkWrapper won't create AudioSink when media doesn't have audio track, I'm thinking about whether we have to create DecodedStream in this case if we let AudioSinkWrapper to manage DecodedStream.

If there's no audio track decoded by MediaDecoder, DecodedStream won't be outputting any audio track either. It'll be running off the video track's NotifyOutput instead.

Flags: needinfo?(apehrson)

(In reply to Jean-Yves Avenard [:jya] from comment #20)

(In reply to Andreas Pehrson [:pehrsons] from comment #19)

(In reply to Alastor Wu [:alwu] from comment #18)

We've already had a place to handle this kinds of case [1], so I think it would be good to let AudioSinkWrapper to manager DecodedStream as well.
[1] https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/dom/media/mediasink/AudioSinkWrapper.cpp#79

As DecodedStream goes as an AudioSink (as far as VideoSink is concerned it's on the same level as AudioSinkWrapper) it feels natural that it only cares about reporting audio position.

I agree that it feels convoluted that an AudioSink would need to know about the video position. That is not the job of an AudioSink.
And while that approach may/will work, I don't think it should be the preferred one for the sake of expediency.

While this is true for idealistic reasons, this isn't how DecodedStream has been working in the past. DecodedStream will de facto have to feed video to its video track, so it will need access to video time as well. It would be unnecessarily complicated to break the video clock out of DecodedStream, just to hand it another way of getting the video clock so it can still feed video.

I say we keep the status quo and let DecodedStream drive the clock after audio ends too. This is what AudioSinkWrapper does, so we keep these two at the same level.

We can refactor this in the future, but it'll have to involve breaking out the feeding of the audio track to something on the same level as AudioSink (not AudioSinkWrapper), and the feeding of video to a dedicated VideoSink. HTMLMediaElement.captureStream() even says to keep playing the audio through the regular audio sink and feed it to an audio track, and to accomodate this a refactor like this is inevitable.

Unless anyone objects, I'll take this bug and fix the regression, and anything else that comes up from the tests I'll be adding; and we'll push the refactor of making DecodedStream a pure audio sink onto the future.

I've confirmed the regression range with mozregression.

Blocks: 1423241
Has Regression Range: --- → yes
Rank: 13
Priority: -- → P2

According to comment27, pass this bug to Andreas, thank you!

Assignee: alwu → apehrson
No longer blocks: 1423241
Regressed by: 1423241

I was in a deep rabbit hole making sure we could append and process precisely the right time of data in the MSG, but I think I made it out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da98d0eafe3df69843eaf9ddfad3275e3e2b46f4

There were some other bugs as well with weirdly formed media files among our tests, but we should behave more like VideoSink in what frames we pass on now.

FWIW there's an intermittent on 3 of the test cases left, that doesn't reproduce locally:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=96bf908f6f9f029e668314d2334389072371a7bd

There are a few shutdown hangs remaining after that, that I hope are not my fault, and hopefully fixed on the latest m-c.

This is similar to what VideoSink does, and is required for us to end playback
on HAVE_CURRENT_DATA (a frame for currentTime must be available).

Null frames could still be valid. It's how DecodedStream signals a pause. They
could also be valid and have an intrinsic size and the force-black flag set.

This is necessary to get an accurate total track duration in case DecodedStream
does a lot of seeking and pausing.

This also removes previous code for avoiding blocking a video track, as this new
code handles that case too.

A case where this wasn't working was bipbop-lateaudio.mp4, where the last frame
has duration 0 and starts and ends before the previous frame has ended. The
VideoSink still renders this frame at the end of playback, so this patch
brings DecodedStream closer to that behavior by rendering all frames with a
start time after the previous frame's start time. The track's duration is still
based on absolute times so things don't blow up.

streamCurrentTime is relative to the start of the stream, as is the track's
segment. We need to account for the track's start time in the stream when
notifying of output.

This gives us a guarantee that the first frame of a media file can be rendered
with a second media element and mozCaptureStream(), even if the file is very
very short.

With longer video-only files there were also cases where nothing ended up being
rendered. Probably because the MediaStreamGraph ended up switching from an
AudioCallbackDriver to a SystemClockDriver and this took enough time to put the
SourceMediaStream::EndTrack and the SourceMediaStream::AddTrackListener calls
for this video track to be processed in the same iteration. The listener would
then always lose to the ending track and update main thread state too late,
leading to its media element not rendering any frames and nasty intermittent
failures.

FWIW, the rev without the last patch, looks like this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=96bf908f6f9f029e668314d2334389072371a7bd

I managed to reproduce this under rr locally by increasing PARALLEL_TESTS to 4.

What happened that caused the errors was that SourceMediaStream::EndTrack (from DecodedStream) and several SourceMediaStream::AddDirectTrackListener/AddTrackListener (from DecodedStream, HTMLMediaElement) ended up being processed in the same MSG iteration.

Because of the SwapMessageQueues that happens at the end of the iteration, the listeners were only really added during the following iteration.

The current iteration meanwhile sent a NotifyEnded() to an existing listener (from MediaStreamTrack) which updated the track's main thread state, and made the HTMLMediaElement end playback. The listeners that took until the next iteration to get added would have notified about initial frame size, and provide the link to the VideoFrameContainer, but they were ignored due to the media element already being ended.

This could also be fixed by MediaStreamGraph processing the ControlMessages at the beginning of an iteration. I am not sure of the background behind why things are as they are however, so I'd be keen to hear more during review. Seeing that we long-term will remove SourceMediaStreams, break video out from the graph, and make all audio in the graph pull-only, this will not really be needed. However, because audio will be pull-only and because tracks shouldn't end at seeks or pauses per spec, we'll need a mechanism similar to this, that ends tracks only once playout of their content have finished, anyway. I think this is a pretty good compromise for the time being.

Depends on: 1543980
Attachment #9057847 - Attachment description: Bug 1536766 - Assert that time always goes forwards in DecodedStream. r?jya → Bug 1536766 - Assert that time always goes forward in DecodedStream. r?jya
Blocks: 1544465
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3e0e3030314d
Switch test_streams_element_capture.html to test all gPlayTests. r=jya
https://hg.mozilla.org/integration/autoland/rev/4f6d240c210d
Re-label two files in gPlayTests as video, since they contain video tracks. r=jya
https://hg.mozilla.org/integration/autoland/rev/ab27d9f5902a
Fix MediaStream currentTime wrt starting late in a graph's timeline. r=jya
https://hg.mozilla.org/integration/autoland/rev/6386ed1b7d74
Follow the video clock when audio ends early in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/000260ba28de
Send a black dummy frame out when a video track that contained no frames ends. r=jya
https://hg.mozilla.org/integration/autoland/rev/36f99fa3c956
Fix a rounding error in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/87616997f160
Don't ignore null frames in MediaStreamGraph. r=padenot
https://hg.mozilla.org/integration/autoland/rev/f9d1f1bfe2b1
Handle DecodedStream's resets when calculating VideoChunk durations in the MSG. r=padenot
https://hg.mozilla.org/integration/autoland/rev/199efe6aec59
Assert that time always goes forward in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/25f3a33ec51b
Remove unused MediaStream::HasCurrentData. r=padenot
https://hg.mozilla.org/integration/autoland/rev/f2923dfcf33c
Better handle overlapping video frames in DecodedStream::SendVideo. r=jya
https://hg.mozilla.org/integration/autoland/rev/addbb04415cb
Make NotifyOutput from SourceMediaStream tracks reflect amount of data in the track. r=padenot
https://hg.mozilla.org/integration/autoland/rev/63fc85885060
Drop frames more accurately when ending and interrupting video track playback. r=padenot
https://hg.mozilla.org/integration/autoland/rev/72d37a08f281
End a track only after the graph has reported reaching its end time in DecodedStream. r=jya,padenot
Backout by aciure@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/41f1dcbe9caa
Backed out 14 changesets for causing RunWatchdog and mediaElementAudioSource wpt failures CLOSED TREE

Thanks for the backout. The RunWatchdog failures are bug 1543980 that I forgot to add a workaround for before pushing. I've done that locally now.

The mediaElementAudioSource WPT were completely unexpected however. I'll try to assess whether they're intermittent and whether we should fix them or update their result expectations.

Flags: needinfo?(apehrson)
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/932ae0651492
Switch test_streams_element_capture.html to test all gPlayTests. r=jya
https://hg.mozilla.org/integration/autoland/rev/d6286692b29a
Re-label two files in gPlayTests as video, since they contain video tracks. r=jya
https://hg.mozilla.org/integration/autoland/rev/b30dd5995d98
Fix MediaStream currentTime wrt starting late in a graph's timeline. r=jya
https://hg.mozilla.org/integration/autoland/rev/67442b20a350
Follow the video clock when audio ends early in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/7a96da026057
Send a black dummy frame out when a video track that contained no frames ends. r=jya
https://hg.mozilla.org/integration/autoland/rev/0bdfa8fcd3c2
Fix a rounding error in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/46eb05673835
Don't ignore null frames in MediaStreamGraph. r=padenot
https://hg.mozilla.org/integration/autoland/rev/32d459a42132
Handle DecodedStream's resets when calculating VideoChunk durations in the MSG. r=padenot
https://hg.mozilla.org/integration/autoland/rev/0e1bbc399742
Assert that time always goes forward in DecodedStream. r=jya
https://hg.mozilla.org/integration/autoland/rev/ea1f424b6c7d
Remove unused MediaStream::HasCurrentData. r=padenot
https://hg.mozilla.org/integration/autoland/rev/c13cae5f1694
Better handle overlapping video frames in DecodedStream::SendVideo. r=jya
https://hg.mozilla.org/integration/autoland/rev/8b2771254100
Make NotifyOutput from SourceMediaStream tracks reflect amount of data in the track. r=padenot
https://hg.mozilla.org/integration/autoland/rev/93eb36847687
Drop frames more accurately when ending and interrupting video track playback. r=padenot
https://hg.mozilla.org/integration/autoland/rev/2787160c8537
End a track only after the graph has reported reaching its end time in DecodedStream. r=jya,padenot
https://hg.mozilla.org/integration/autoland/rev/e2a82cb2f43c
Update WPT expectations. r=padenot
Depends on: 1546655
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: