Closed Bug 1517199 Opened 3 years ago Closed 5 months ago

Setting playbackRate on a media element does not work when the media element is going to the MediaStreamGraph

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox91 --- fixed

People

(Reporter: ntim, Assigned: alwu, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed, Whiteboard: [lang=c++] [media-audio])

Attachments

(6 files, 4 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
+++ This bug was initially created as a clone of Bug #966247 +++

The playbackRate property of the HTMLMediaElement has the signal processing bit implemented at the AudioStream level, to minimize latency between the setting of the property and the actual effect on the sound.

This was okay before, but because we can now capture a stream (mozCaptureStream{,UntilEnded}, and MediaElementAudioSourceNode) without directly touching the AudioStream, the playbackRate is ignored in those cases.

It would be nice to be able to have the playbackRate property working regardless of the mean we use to output the sound.
Duplicate of this bug: 1519707

I'm facing the same problem after I changed the volume via GainNode and changing the playbackRate of the HMLAudioElement. Changing the playbackRate to 2.0 works with Chrome and Opera, but not with Firefox. To find a solution, I created a post on stackoverflow:
https://stackoverflow.com/questions/57391854/how-to-set-playbackrate-of-htmlaudioelement-in-combination-with-a-sourcenode-and

I don't understand why the playbackRate can't be changed when I just changed the volume with the Web audio API. Also I tried to find the proof that changing playbackRate is not supported in the specs. I found nothing. It doesn't make sense to me.

Please correct me, if im wrong. In test provided by Daniel, playback rate is changing, while we are in Bypass mode (isBypassing == true) and, doesnt react in other mode (when isBypassing == false). That is the problem, it should has the same behavior in both modes, am I right?
I think I can try to solve this.

(In reply to Vadim from comment #4)

Please correct me, if im wrong. In test provided by Daniel, playback rate is changing, while we are in Bypass mode (isBypassing == true) and, doesnt react in other mode (when isBypassing == false). That is the problem, it should has the same behavior in both modes, am I right?
I think I can try to solve this.

You are right. That's what i also experience. Is there any progress on that?

Priority: P2 → P3
Whiteboard: [lang=c++] → [lang=c++] [media-audio]
Duplicate of this bug: 1624548

is there any progress on this issue? It's over an year old and still not fixed...

Hi, Paul,
Could you give me any suggestion about how should I fix this bug?
Thank you.

Assignee: nobody → alwu
Flags: needinfo?(padenot)

This method appends the audio to the graph. It's necessary to time-stretch it (for now we only apply the volume), using SoundTouch, like we do in AudioStream.cpp. It might be necessary to extract the time-stretcher if we're using it in two locations, to have the same options, etc.

Flags: needinfo?(padenot)
Duplicate of this bug: 1648277
Blocks: 1654518
Duplicate of this bug: 1660534

Any progress or update on this bug? It's been 2 years now...

Duplicate of this bug: 1675340

Has not yet appeared information?Very necessary thing

Paul, do you have an updated permalink in comment 9? It would be helpful for a contributor who wants to look at it in their free time. I may have time in the next months but I can't promise anything, though I'm sure anyone who's free right now would find this helpful :)

Flags: needinfo?(padenot)

(In reply to Tim Nguyen :ntim from comment #15)

Paul, do you have an updated permalink in comment 9? It would be helpful for a contributor who wants to look at it in their free time. I may have time in the next months but I can't promise anything, though I'm sure anyone who's free right now would find this helpful :)

Hi, I've recently looked into the code but I'm not really fluent C++. Paul was referring to this line at the time, so the DecodedStream::SendAudio method. This is the up-to-date permalink.

Duplicate of this bug: 1251640
Duplicate of this bug: 1323012

(In reply to Nerixyz from comment #16)

Hi, I've recently looked into the code but I'm not really fluent C++. Paul was referring to this line at the time, so the DecodedStream::SendAudio method. This is the up-to-date permalink.

Thanks! I've taken a quick look at this and all the extra context around it, there are a couple of things I found:

  • bug 966247 comment 32 from :jwwang, so this will need some extra work for video input as well, and buffering latency as well.

http://searchfox.org/mozilla-central/rev/7419b368156a6efa24777b21b0e5706be89a9c2f/dom/media/mediasink/DecodedStream.h#33

We will need to overhaul DecodedStream which is responsible for sending
audio/video frames to a media stream.

To support playbackRate rate changes in video is simple. We just need to
modify the duration of each frame (duration /= playbackRate) before sending
it to the media stream.

We have more work to do when it comes to audio. We will need to employ
SoundTouch (which is also used by AudioStream) to stretch audio frames in
response to playbackRate changes.

The biggest issue is the latency in playbackRate changes introduced by the
push-model of DecodedStream. DecodedStream pushes audio/video frames to the
media stream immediately after the frames are received by MDSM. So if the
amount of decode-ahead is 10s, the change in playbackRate will take effect
10s later.

One solution is to change the push-model to pull-model to minimize the
latency (which we have done for AudioStream!). However it requires rewriting
most of the code of DecodedStream which is certainly not trivial.

  • bug 1443511 which solves the latency mentioned by the previous comment for volume changes (without rewriting DecodedStream)

  • Relevant code in AudioStream.cpp is mTimeStretcher and notably GetTimeStretched

My familiarity with low-level audio/stream concepts isn't so good, so I hope that can be useful to anyone who wants to look at this.

Sorry for taking the bug but didn't do any actions, I was working on other higher priority stuffs. If anyone gets interest in this bug, feel free to take it, or I can take this bug again.

So from the things I know so far, the work would need to be done inside GraphDriver where we create low-level audio stream (cubeb) for the Gecko-level audio stream.

AudioStream is used for normal audio playback where we would perform the playback rate change so this issue won't happen on normal playback.

Severity: normal → S3

Alastor, I might take this again after I'm done with the telemetry stuff, let's keep each other informed of our respective progress.

Flags: needinfo?(padenot)

Guys, is this really that hard to fix?

I started looking at this issue today, but I'm not sure if my direction is correct. First, as the AudioCallbackDriver is the place interacting with cubeb, that probably is the place to change the playback rate. So I started adding a method on MediaTrackGraph, which is to set the playback rate on the driver, and send the playback rate from DecodedStream via accessing mData->mAudioTrack->Graph(). That would append a message in order to set the playback rate to the driver on the graph thread.

However, when digging more details in the AudioCallbackDriver, I found that it's actually a source to accept a mixed stream via MixerCallback(), which is a mixed stream from different input and output stream together, that seems indicating that the playback rate change should not put on here, because not all streams need to be modified the rate.

Paul, would you mind to give me some suggestion about this?

Thank you.

Flags: needinfo?(padenot)

In addition, the work for this issue are more media stream related, so move this to another component.

Component: Audio/Video: Playback → Audio/Video: MediaStreamGraph

This needs to happen in DecodedStream (around here. Let me know if a quick call would be useful to get you started?

Flags: needinfo?(padenot)

Quick update, I've talked to Paul today, the current architecture of processing audio samples in DecodedStream is pull-driven, which makes us hard to implement a solution for playback rate on a realtime stream, because push-driven method causes some latency. That means the playback rate won't take effect immediately. Therefore, we will need to do some work to change this behavior and make a better solution for playback. That would need more time, so sorry for the waiting.

AudioDecoderInputTrack stores the audio data in a SPSC queue which allows MTG to pull data from it by calling AudioDecoderInputTrack::AppendData().

As we will use AudioDecoderInputTrack for audio track later, need to remove some dependency of SourceMediaTrack first.

Depends on D106040

Hi, Andreas,
My current patches implement a new class AudioDecoderInputTrack to fulfill the pull-driven track for DecodedStream. However, after applying those patches, take comment1 as an example, the output sound is completedly distortion, and I haven't found the reason.
(Note, those patches haven't performed time stretching yet. They are simply changing the push-driven to pull-driven)
Could you give me some suggestion and see if anything I did is wrong?
Thank you so much.

Flags: needinfo?(apehrson)

I'm going to build your patches and dump the audio (MOZ_DUMP_AUDIO=1) to see if I can hear/inspect what the problem seems to be. In the meantime I'll provide some pointers on the AudioDecoderInputTrack for some best-practice things.

Flags: needinfo?(apehrson)

One issue I see is that you include silence from underruns in the clock. That will confuse the state machine because you will overrun the duration before ending. The other one I see is that you assume AUDIO_FORMAT_S16. This is causing the distortion. On desktop we always use float. On Android we still use int16 I believe, but that is subject to change. Some older versions did not support float AIUI.

s/AUDIO_FORMAT_S16/AUDIO_OUTPUT_FORMAT/ will make your troubles go away.

This could be handy perhaps, I rewrote the test case in comment 1 a bit to better be able to reason about the output. See https://codepen.io/pehrsons/pen/KKNyVmX. It generates a 100Hz sine tone on the fly (needs a pref) and plays that. I also removed the filter, so activating the filter will now turn on playback through the graph using DecodedStream. With MOZ_DUMP_AUDIO=1 it's useful to look at the generated wav file (appearing in the working directory, sandbox needs to be off) in e.g., Audacity.

Hi, Andreas,

Thank for the suggestion you gave me last time, and now I have some new questions. So for current WIP, I changed the way how DecodedStream deal with the audio data, now audio track would determine when it should end, instead of setting the end time via EndTrackAt() and comparing the time in NotifyOutput().

The reason of doing that is because, at the time calling EndTrackAt() DecodedStream actually didn't know the actual frames amount which would be writed. That would be determined later while appending data onto mSegement in AudioDecoderInputTrack (appending remaining + resampling / time stretching)

The problem I encounter now is that, I don't know how to suspend the track when DecodedStream is not playing (i.e. media is paused). So even if the DecodedStream is not playing, AudioDecoderInputTrack would keep appending silience to the MTG, which causes DecodedStream::NotifyOutput() being called during media is paused.

I saw that MediaTrack::Suspend() seems something I can try, but I'm not sure if that is a right way to do, because MTG might have its own way to manage the tracks.

Thank you so much.

Flags: needinfo?(apehrson)

Suspending the input track wouldn't be enough, because you could still be underrunning while the DecodedStream is playing, and NotifyOutput would wrongly account for that silence too.

I mentioned a bit on this under Q5 in Paul's doc, pasted here for reference:

Q5 : Do I need to care about the A/V synchronization? Or that will be handled by MTG?

Currently the audio clock reported from DecodedStream to MediaDecoderStateMachine is powered by MediaTrackListener::NotifyOutput. You will have to change this because ProcessedMediaTracks are not allowed to underrun (so silence you add because no real data was available would count towards progressing the clock). But you can keep the listener for video and for audio you can call directly into NotifyOutput from the graph thread in ProcessInput. mOnOutput needs to signal the amount of real samples added to the track. When time stretching, I guess this number of frames would have to be converted too(?). Likewise for video in video-only cases, I suppose.

Flags: needinfo?(apehrson)

Thanks! That did solve my problem. Will update patches later after having more progress.

Quick update, I've finished the implementation of changing the data process model in DecodedStream, now I'm testing them to ensure that they won't cause any breakage. If it goes well, then I will start the next step that is to implement the time stretching for audio data.

Landing these separately could be useful for detecting regressions from the first piece.

Depends on: 1695265

Comment on attachment 9204703 [details]
Bug 1517199 - part1 : implement AudioDecoderInputTrack.

Revision D106040 was moved to bug 1695265. Setting attachment 9204703 [details] to obsolete.

Attachment #9204703 - Attachment is obsolete: true

Comment on attachment 9204704 [details]
Bug 1517199 - part2 : use MediaTrack as input parameter for DecodedStreamGraphListener's method.

Revision D106041 was moved to bug 1695265. Setting attachment 9204704 [details] to obsolete.

Attachment #9204704 - Attachment is obsolete: true

Comment on attachment 9204705 [details]
Bug 1517199 - part3 : use AudioDecoderInputTrack for audio in DecodedStream.

Revision D106042 was moved to bug 1695265. Setting attachment 9204705 [details] to obsolete.

Attachment #9204705 - Attachment is obsolete: true

Quick update: the pre-requirement work has been moved to bug1695265, will continue work on this one after finishing that.

update : just landed bug1695265, and I will continue to do the rest of work in this bug.

update : my current WIP can adjust playback rate successfully, will do more testing to ensure that everything goes well.

update : hooray! now my patch work on all kinds of audio, I tested that on Youtube and it worked well! I'm going to test my patch on the try server to see if we get any failure.

Attachment #9220730 - Attachment description: WIP: Bug 1517199 - time stretching samples in AudioDecoderInputTrack. → Bug 1517199 - part1 : time stretching samples in AudioDecoderInputTrack.

Depends on D114560

This is great news :alwu.
I emailed you another test case where I experienced something which I think is related to this issue.
Can you please see if your patch has an effect on it?

Short summary sample is when trying to use an AudioContext on a video element, you lose the ability to setPlaybackRate.
Simplest example being:
( new AudioContext() ).createMediaElementSource( obj_video_tag )

Flags: needinfo?(alwu)

Yes, I've tested your site and it works well after applying my fix.

Flags: needinfo?(alwu)

I'm going to target landing these patches on Fx90, so change the priority to P1.

Priority: P3 → P1

Depends on D114916

Depends on D114921

I found another problem in DecodedStream that would make us incorrectly determine when the video would end if we adjust playback rate to 1x+ (eg. 2x). Currently the video track and the audio track use different type of media stream track, and the one that video track uses doesn't support for the playback rate changes, which makes the video output time from the track is slower than the acutally audio output time.

Eg. if I play a video, which duration is 60s, in 2x playback rate, when the audio ends in 60s, the video output time is still in 30s. So we have to wait another 30s in order to receive the ended event for media element.

I'm working on this problem, and will submit another patch to handle that.

When we adjust the playback rate on the audio track, the audio clock time would be no longer align with the graph time.

Eg. playback rate=2, when the graph time passes 10s, the audio clock time actually already goes forward 20s.

After audio clock ended, the video clock would start to drive the clock time but the video clock time is align with the graph time, which means it would be smaller than the audio clock in that situation.

Therefore, we have to ignore the video clock time if that happens. In addition, this patch also address the duration change on video frames based on the playback rate.

Depends on D114922

Attachment #9221710 - Attachment description: Bug 1517199 - part5 : handle playback rate change on video track and add a crash test. → Bug 1517199 - part5 : handle the playback rate change on video track and add a crash test.

Update : currently my patches already got r+, but the reviewer want me to add more test to ensure the A/V won't be broken when we change the playback rate for the media stream track.

When I was watching normal videos on Youtube and changed its playback rate during capturing the stream, I didn't notice that anything wrong on the A/V sync. But for a more precise check, I started testing on this video where we can clearly notice the defect if the A/V isn't sync.

Then, I found out most of time the A/V sync works well if you capture stream before video starts playing. But sometime it would be broken, one example here is to start video first without capturing the stream, then modify the playback rate to 0.5x, then we can see the A/V out of sink. Here is the profiled result. When DecodedStream's clocktime was 4040854, VideoSink already outputed some frames that were later than the clocktime.

At that time the frames in VideoSink are like this, it should only play the first frames, instead of those frames later than the clock time.

[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder DecodedStream=7f94230e4800 time is now 4040854
[Child 674857: MediaDecoderStateMachine #1]: D/MediaDecoder DecodedStream=7f94230e4800 Dropping audio [4014500,4034500] sz=66
[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f9410d857c0 playing video frame 4037000 (id=7b) (vq-queued=5)
[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f9410d857c0 playing video frame 4071000 (id=7c) (vq-queued=5)
[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f9410d857c0 playing video frame 4104000 (id=7d) (vq-queued=5)
[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f9410d857c0 playing video frame 4137000 (id=7e) (vq-queued=5)
[Child 674857: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f9410d857c0 playing video frame 4171000 (id=7f) (vq-queued=5)

But the weird thing is that, when you change the rate to other value than 0.5x, such like 1.5x, 2.0x, it would only have a short period of times where A/V sync is broken (eg. ~1s), then adjust back to normal soon. But for 0.5x, it would still stuck on the wrong A/V. However, if you pause the video, and resume it again, the A/V would back to normal immediately no matter what playback rate you set before.

I need to check the logic in VideoSink to see what happened. In addition, I also noticed another issue that is sometime if we change the playback rate dramastically, eg. from 2.0x to 0.5x (only happening from very fast to very slow), then the audio clock would get stuck. The reason of clock getting stuck is because MDSM stopped decoding audio data as we have already had many audio data due to previous fast playback rate (MDSM would control how many audio data we need to buffered based on the playback rate.) I haven't fully understood why that would happen comparing with simply playing video in a very slow playback rate. And wonder if those two issues have any similarity in the root cause.

I think I understand why A/V isn't sync after we capture the stream. That is actually not related with changing the playback rate. There are two reasons (1) we incorrectly discard some audio frames when we swtich to the DecodedStream from AudioSink (2) clock time isn't consistent between two sinks.


The first issue "incorrectly discarding some audio frames" happens when we switch to the DecodedStream from AudioSink. AudioSink would store some frames in its own media queue before those audio frames get played. After AudioSink got shutdown, it doesn't push those frames back to the original audio queue. Therefore, when DecodedStream starts, it try to retrieve audio data from the audio queue, but it doesn't know that there is already a gap between "the audio DecodedStream is going to play" and "the audio that AudioSink had played". This issue also causes the problem that I mentioned in comment54, where the audio clock would stop working during the dramastically playback rate change.

After solving the first issue, the A/V syncs would be getting closer, but still has a little bit delay between audio and video. Because the DecodedStream will determine the audio clock based on the start time. Currently the start time is based on the current media time that actually has some difference with the audio clock time. The audio clock time is calculated based on the amount of audio frames we've played. So if the DecodedStream use the start time that is different with the amount of audio frames we've played, then the audio clock would be shifting a little bit from the video.

If I simply solved the first issue, but not to address the second issue. I can still notice that around 2 frames shifting between audio and video. But if I solved the second issue as well by using this patch. Then I can see the A/V sync works perfectly.

The interesting thing is that, the first issue has existed for a long while. But on the version where I didn't land the refactor in bug1695265 yet, I didn't see A/V going unsync after capturing a media stream. TBH I don't know why, but abandoning the frames that we've not played before is apparently wrong to me.

When we switch to the DecodedStream from AudioSink. AudioSink would store some frames in its own media queue before those audio frames get played. After AudioSink got shutdown, it doesn't push those frames back to the original audio queue.

Therefore, when DecodedStream starts, it try to retrieve audio data from the audio queue, but it doesn't know that there is already a gap between "the audio DecodedStream is going to play" and "the audio that AudioSink had played".

Comment on attachment 9223069 [details]
Bug 1517199 - part6 : ensure AudioSink won't discard frames that are not played yet.

Revision D115749 was moved to bug 1712595. Setting attachment 9223069 [details] to obsolete.

Attachment #9223069 - Attachment is obsolete: true
Depends on: 1712595
Depends on: 1712598
Attachment #9221710 - Attachment description: Bug 1517199 - part5 : handle the playback rate change on video track and add a crash test. → Bug 1517199 - part5 : handle the playback rate change on video track and add tests.

Update : reviews are all going well, and very close to finish. But we're in the soft-code-freeze already, enabling a new feature (adjusting playback rate on the captured stream) is not encourage by the release manager. So I will land these patches on Fx91, and sorry couldn't make that happen on Fx90.

But I will land those blocker bugs that will fix the A/V sync issue on Fx90.

Attachment #9221710 - Attachment description: Bug 1517199 - part5 : handle the playback rate change on video track and add tests. → Bug 1517199 - part5 : handle the playback rate change on video track and add crash test.
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/172ef2cae6ed
part1 : time stretching samples in AudioDecoderInputTrack. r=padenot
https://hg.mozilla.org/integration/autoland/rev/772b0a555674
part2 : enable wpts. r=padenot
https://hg.mozilla.org/integration/autoland/rev/8d6a8d4836ea
part3 : add more logs. r=padenot
https://hg.mozilla.org/integration/autoland/rev/1c845c5e1253
part4 : add gtest. r=padenot
https://hg.mozilla.org/integration/autoland/rev/b1945b566398
part5 : handle the playback rate change on video track and add crash test. r=padenot
https://hg.mozilla.org/integration/autoland/rev/2d291a99004d
part6 : add A/V sync test case for changing playback rate dynamically. r=padenot

Release Note Request (optional, but appreciated)
[Why is this notable]: This fixes a long existing issue (originally reported in bug966247, 7 years ago) so I think adding this on the note would be good for people waiting this change.
[Affects Firefox for Android]: Yes
[Suggested wording]: Fix the issue of changing playback rate failed on the media element which stream is being captured.
[Links (documentation, blog post, etc)]: None

relnote-firefox: --- → ?

This is more of a thing for developers, setting dev-​doc-needed.

Suggested wording for developers:

  • Changing the speed of a media element ("<video>" or "<audio>") using the playbackRate attribute now works when the media element is captured to a MediaStream or via AudioContext.createMediaElementSource.
Keywords: dev-doc-needed

Unfortunately there is no way to detect this, so websites using this feature are likely to UA sniff, and so it won't start working magically without a fix.

relnote-firefox: ? → ---
You need to log in before you can comment on or make changes to this bug.