Open Bug 1404278 Opened 7 years ago Updated 2 years ago

[MSE] glitch between first and second segment of audio (hls)

Categories

(Core :: Audio/Video: Playback, defect, P3)

57 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: thomas_jenkinson, Unassigned)

References

(Depends on 1 open bug)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

Steps to reproduce:

Click play on a track on soundcloud. E.g. https://soundcloud.com/epicmountain/sun-on-earh


Actual results:

Just before the 0:02 there is a blip in the audio. This is at the boundary between the first and second segment.


Expected results:

There should be no blip in the audio. It should move seamlessly between the segments.

This blip only occurs when the audio element is already in the playing state when the first segment is appended to the buffer. If we wait a bit and then call play() after the first segment has already been appended, there is no blip. Also when replaying over the problem area the blip has usually gone.

This problem is currently still present on soundcloud.com but we have a workaround which we will probably deploy some point next week. The workaround is to seek to 50ms into the media. I am not exactly sure why this works, but with this the switch between the first and second segment is seamless. It is interesting because we also used to have the same bug in safari, but now this problem has gone away.

I might be able to provide a demo page for our playback library and some code that demonstrates this issue, but I'm not sure if I would be able to share this publicly.
Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Version: 55 Branch → 57 Branch
Flags: needinfo?(jyavenard)
The 'fix' I mentioned above has been deployed now, but if someone wants to look into this let me know and I'll try and put a demo together or something.
(In reply to thomas_jenkinson from comment #1)
> The 'fix' I mentioned above has been deployed now, but if someone wants to
> look into this let me know and I'll try and put a demo together or something.

We're really aggressive with starting audio, which means that calling play() then append will start playing as soon as the first frame is decoded. This shouldn't be an issue if you append multiple audio frames at once. If you're appending one at a time then the Javascript will be racing against the playback engine.

It will be helpful if you can provide a link that reproduces the issue.
Priority: -- → P3
(In reply to thomas_jenkinson from comment #1)
> The 'fix' I mentioned above has been deployed now, but if someone wants to
> look into this let me know and I'll try and put a demo together or something.

Yes please...
Flags: needinfo?(jyavenard)
Interesting. I just put this example together: https://tjenkinson.github.io/firefox-mse-audio-glitch/
We are supposed to hear a glitch right at the start?

I can't hear anything with nightly. But as Anthony mentioned earlier, if what I think it is, what's happening (playback starts and play the first media segment before the second segment is being added).

I suggest you wait for the "canplaythrough" event, if the media had set the autoplaying attribute, this is what in effect it would have waited for (readyState moving to HAVE_ENOUGH_DATA)

calling play() too early, and you have very small segments, it could easily stall.

You can test this by checking if the waiting and/or stalled events are fired,
The glitch happens between the 2 segments at around 2 seconds. I have updated the demo now to only call play() in the canplaythrough event handler and the issue is still there. The issue also happens if play() is called after the last append completes.

I am also able to reproduce in quantum version 57.0b6 (64-bit), but interestingly only when dev tools is open and the http cache is disabled.
(In reply to thomas_jenkinson from comment #6)
> The glitch happens between the 2 segments at around 2 seconds. I have
> updated the demo now to only call play() in the canplaythrough event handler
> and the issue is still there. The issue also happens if play() is called
> after the last append completes.
> 
> I am also able to reproduce in quantum version 57.0b6 (64-bit), but
> interestingly only when dev tools is open and the http cache is disabled.

canplaythrough requires 10s of data to be buffered before play start. If you still hear the glitch, then the issue is likely different to what first thought.

caching shouldn't make a difference though, XHR request aren't cached...

I did manage to hear the small interruption at 2s once, buut that's out of about 20 reload...
I am able to get the issue consistently after clicking refresh with the cache disabled, and with cache enabled it only happens occasionally.

Also there is only 5 seconds of data but canplaythrough is still firing?

strange
(In reply to thomas_jenkinson from comment #8)
> I am able to get the issue consistently after clicking refresh with the
> cache disabled, and with cache enabled it only happens occasionally.
> 
> Also there is only 5 seconds of data but canplaythrough is still firing?

Hmmmm, you're right...

Upon debugging, it's because your MP4 metadata has a duration set to 4294967295 (e.g. UINT32_MAX), what we use to mark it as infinite and so we assume it's infinite. This prevents us from knowing if you have enough data to play without interruption.

And so we assume you always have enough data: canplaythrough will be fired after the demuxing of the first moof.
We do handle infinite streams differently than others, we pretty much assume they are live ones where low latency matters more.

So we're back to the original hypothesis, my guess is that playback reaches the end of the first segment before it has time to receive the next one (could even be time to process the 2nd moof in your first segment).

So work around: set a proper duration in your MP4 metadata, or set it by JS.
On second thought, this seems to be a bug in our MP4 demuxer, the duration is marked as UINT32_MAX, yet the rust MP4 demuxer returns that it has no duration.

Link to the audio segment is https://tjenkinson.github.io/firefox-mse-audio-glitch/segments/segment0.m4a
Flags: needinfo?(ayang)
See Also: → 1407243
Depends on: 1407261
I've created bug 1407261 to specifically look at why the first segment duration is treated as infinite.

canplaythrough event really should be the event to listen for in order to determine if playback can start.

For this particular file, the issue is tracked in bug 1407243
Flags: needinfo?(ayang)
Thanks. Even if play() is called after the second segment append finishes, there is still the glitch, so I guess at that point it is still processing internally? If I immediately seek to 50ms the problem doesn't happen. Could play() wait for whatever the browser is waiting for after a seek request?
No, when updateend is fired, the processing has completed.

If you still hear it even when 5s of audio has been added, then it's likely not an issue with the demuxing side of things. The buffered range is continuous.
Maybe the mp3 decoder produces discontinuous output. can you reproduce it on other platforms like mac?

:jwwang could you check that stream and verify that the data fed to the MDSM is continuous?

And why doing a seek to 50ms first prevent the problem
Flags: needinfo?(jwwang)
I am testing on macos Sierra (10.12.6).

Calling `play()` after the first updateend has the glitch. Calling `play()` after the second updateend doesn't have the glitch.

When calling `play()` after the first updateend the second updateend is fired before reaching the blip point though.
I can't repro the issue by playing this sample on Linux: https://tjenkinson.github.io/firefox-mse-audio-glitch/.

Nor did I find gaps in audio data.
(MOZ_LOG=AudioStream:3 will show gaps in audio in the form of "lost xxx frames".)
http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/AudioStream.cpp#664
I can repro the issue on osx by playing https://tjenkinson.github.io/firefox-mse-audio-glitch/.

https://s3.amazonaws.com/tmp.transloadit.com/0a0509f0aedc11e79d76cf6b72c2e00f.png
This waveform shows the decoded data sent to the audio device and you can see there is a gap in the middle which caused the blip.

So it is either the original encoded data is bad or our decoder is bad to cause the gap.

Can you write another test page which seeks to 50ms to avoid the blip?

Thanks!
Flags: needinfo?(thomas_jenkinson)
I had totally missed that the issue was happening on Mac. I was testing on Windows, maybe why I didn't hear anything.
That would explain why Safari had the issue too.

:jw_wang can you check still that it's not due to entering waiting for data mode inside the MediaFormatReader which cause internal seeking to be set.
(In reply to Jean-Yves Avenard [:jya] from comment #17)
> :jw_wang can you check still that it's not due to entering waiting for data
> mode inside the MediaFormatReader which cause internal seeking to be set.

Do you mean to check if MDSM's audio request is rejected by WAITING?
that would be a side consequence yes... after decoding has started and the MFR has started providing back decoded data.
Internal Seek is the mode the MFR enters when it finds a gap in the data. It then seek to the last demuxed point and start decoding again.

It would be surprising to always happen at 2s though.

If it does encounter the gap, and is seeking, it could be related to bug 1400674 and/or bug 1321249. Bug 1321249 in particular means that the decoder could output silence with the first decoded sample, which we're supposed to trim (and we don't).

That would explain why you hear a 1024 samples (0.02s silence) silence.
Isn't it that sometimes MFR does internal seeking such that MDSM doesn't feel WAITING at all?
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #20)
> Isn't it that sometimes MFR does internal seeking such that MDSM doesn't
> feel WAITING at all?

no...

When the MFR encounter a gap, it drains the decoder to return has many frames as possible, and after returning that last frame will reject the DataPromise with WAITING_FOR_DATA.

It will then start seeking to last frame returned.
Once new data is available, it will resolve the waiting promise and wait for the next Request*Data and restart decoding from there.

At the time the MFR seek, it doesn't yet know when new data will be coming, which is why WAITING_FOR_DATA is returned
Here is a version with the seek. https://tjenkinson.github.io/firefox-mse-audio-glitch/seek-hack.html

I realised this seek 'hack' also only works when it happens after play() in the loadedmetadata handler. Doing it in canplaythrough doesn't work. Also it still blips occasionally but much less frequently.
Flags: needinfo?(thomas_jenkinson)
(In reply to Jean-Yves Avenard [:jya] from comment #17)
> :jw_wang can you check still that it's not due to entering waiting for data
> mode inside the MediaFormatReader which cause internal seeking to be set.

My test showed that MDSM got NS_ERROR_DOM_MEDIA_WAITING_FOR_DATA @2948934us.
Flags: needinfo?(jwwang)
(In reply to thomas_jenkinson from comment #22)
> Here is a version with the seek.
> https://tjenkinson.github.io/firefox-mse-audio-glitch/seek-hack.html
> 
> I realised this seek 'hack' also only works when it happens after play() in
> the loadedmetadata handler. Doing it in canplaythrough doesn't work. Also it
> still blips occasionally but much less frequently.

The blip is still present in this test page.

MDSM got 2 NS_ERROR_DOM_MEDIA_WAITING_FOR_DATA, one at t=50000, and the other at t=2929274.
Yes even with the 'fix' it was still happening for me on the test page sometimes. We are doing pretty much the same thing on soundcloud.com though and it seems to solve the problem there.
Waiting for canplaythrough before calling play() might work, but couldn't the same issue also occur if the user seeks to an unbuffered region, once media is then inserted there? Should we always call pause() when the player stalls and wait for canplaythrough before playing again?
We don’t know on why the audio decoder is inserting silence. So it’s all conjecture at this point.
I can't repro the issue on Nightly. Can you try again?
Flags: needinfo?(thomas_jenkinson)
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #28)
> I can't repro the issue on Nightly. Can you try again?

I can still reproduce in 58.0a1 (2017-10-22) (64-bit) (on mac), with dev tools open and the 'Disable HTTP Cache' option enabled.
Flags: needinfo?(thomas_jenkinson)
Any update on this? I can still reproduce in latest firefox (57.0).

Thanks
Bump. This still happens in 58.0.2

Anything I can do to help with debugging?
Will https://bugzilla.mozilla.org/show_bug.cgi?id=1407261 fix this? If someone points me at the right place I can have a look around and see if I can fix it
Flags: needinfo?(jyavenard)
(In reply to thomas_jenkinson from comment #32)
> Will https://bugzilla.mozilla.org/show_bug.cgi?id=1407261 fix this? If
> someone points me at the right place I can have a look around and see if I
> can fix it

no, I don't think it's related...

It could be that there's a gap happening when playback has started and the decoder hit the end of the buffered range, before the 2nd segment is appended.

This causes to seek back slightly (due to encoder delay) in order to resume in the right location.

maybe there's something wrong happening and silence is introduced when it shouldn't or something to that extent.

If you could record the output (like using Sound Flower on mac) could have a look at what you're hearing (I can't hear anything myself)
Flags: needinfo?(jyavenard)
Can you confirm that https://tjenkinson.github.io/firefox-mse-audio-glitch/seek-hack.html still produces the issue for you?

I'm not hearing anything...
That's the wrong URL. You should test with https://tjenkinson.github.io/firefox-mse-audio-glitch/index.html.

The one you were using contains the seek which somehow works around the issue (most of the time) :)

I just tested https://tjenkinson.github.io/firefox-mse-audio-glitch/index.html again in 59.0.2 (64-bit) and 61.0a1 (2018-04-26) (64-bit) and it's still there unfortunately
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.