Closed Bug 1207946 Opened 9 years ago Closed 9 years ago

Broken A/V sync with a youtube video

Categories

(Core :: Audio/Video: Playback, defect, P1)

43 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- fixed
firefox43 --- fixed
firefox44 --- fixed

People

(Reporter: jya, Assigned: jya)

References

(Blocks 1 open bug, )

Details

Attachments

(5 files, 3 obsolete files)

I cant always make it happen.

STR:
- Open https://www.youtube.com/watch?v=fhcZ6b2FSsk
- Change resolution to 240p
- Let it play a bit.
- seek to 1:53
- seek to 2:44

Audio is stopped, video plays.
Sometimes audio plays but severely out of sync.

Example of about:media
Internal Data:
Dumping data for demuxer 1550219e0:
Dumping Audio Track Buffer(audio/mp4a-latm): - mLastAudioTime: 116.238684
NumSamples:7980 Size:4807838 NextGetSampleIndex:699 NextInsertionIndex:7980
Buffered: ranges=[(100.008344, 116.238684), (140.016326, 309.080272)]
Dumping Video Track Buffer(video/avc) - mLastVideoTime: 193.600000
NumSamples:5521 Size:6879789 NextGetSampleIndex:2058 NextInsertionIndex:5521
Buffered: ranges=[(110.000000, 135.000000), (150.000000, 309.033333)]

currentTime was 180s ; while audio decode was stuck at 116s (here we had no audio playing, but video did)
Got about 4s off A/V sync:

https://www.youtube.com/watch?v=fhcZ6b2FSsk
	mediasource:https://www.youtube.com/1a838b1a-1dca-df4f-8cdd-c73b7d5bd639
	currentTime: 166.205319
		SourceBuffer 0
			start=140.016326 end=309.080272
		SourceBuffer 1
			start=140 end=309.033333
	Internal Data:
	Dumping data for demuxer 12b2b7080:
		Dumping Audio Track Buffer(audio/mp4a-latm): - mLastAudioTime: 163.537913
			NumSamples:7281 Size:4386596 NextGetSampleIndex:1013 NextInsertionIndex:7281
			Buffered: ranges=[(140.016326, 309.080272)]
		Dumping Video Track Buffer(video/avc) - mLastVideoTime: 167.000000
			NumSamples:5071 Size:6319506 NextGetSampleIndex:810 NextInsertionIndex:5071
			Buffered: ranges=[(140.000000, 309.033333)]

notice that currentTime is at 166s, yet audio decode was at 163s and video was 167s (slightly ahead)
Attached file playback.log.zip
The broken A/V sync always occurs when audio is evicted and the AudioData Promise enters is rejected with waiting_for_data ; the MDSM then enters in buffering mode ; yet video continues to be requested and played (19 frames queued). Once audio request resume ; the decoder shows that audio 173290317 is getting played ; yet just above we can see that the last video frames played was 174433333 ; and now A/V sync is broken by 1.1s.

The issue occurs from line 213434; where the audio sourcebuffer got cleared (following a change of resolution from 240p to 240p). Line 213538 is when the first audio request got rejected.

JW : do you have an idea on what could be occurring ? the issue seems related to the MDSM.

We've had A/V sync issue for quite a while , but I had never been able to fully reproduce the problem until now.
Flags: needinfo?(jwwang)
(In reply to Jean-Yves Avenard [:jya] from comment #2)
> Created attachment 8665763 [details]
> playback.log.zip
> 
> The broken A/V sync always occurs when audio is evicted and the AudioData
> Promise enters is rejected with waiting_for_data ; the MDSM then enters in
> buffering mode ; yet video continues to be requested and played (19 frames
> queued).

Playback will be stopped when entering buffering state. Video rendering should also be stopped as well. Can you show me the line numbers in the log where video continues to play in buffering state?
Flags: needinfo?(jwwang)
Sorry, I meant the RequestVideoData is continuously called (while in waiting mode for audio)


Line 220463 it changed from BUFFERING to DECODING and it starts playing video frame 173266666
but the audio it starts playing is DecodedAudioDataSink=10013f2d0 playing 1024 frames of audio at time 172082947

so 1.06 behind the video.
The compositor will schedule the time to render video frames. So when video frame 173266666 is sent to the compositor, it should be rendered about 1s later instead of being renderer immediately.

I will add some logs to see if timestamps attached to video frames are correct. If they are, we will need compositor guys to check if video frames are scheduled and rendered at the wrong time.
Should the first frame in the trackbuffer have a time superior to the seek target, mNextSampleTime would be incorrectly set to 0.

This isn't the cause of the problem describe here ; but it could cause a seek to be incorrect should sourcebuffer.remove() be called right after the seek.
Attachment #8665820 - Flags: review?(gsquelart)
Attachment #8665820 - Flags: review?(gsquelart) → review+
STR:

Problem #1:
- reload page (https://www.youtube.com/watch?v=fhcZ6b2FSsk)
- As soon as playback starts seek to 2:03.

Result: video restart without audio, about:media will show that audio is stuck at around 4.8s
video:
[(0.000000, 10.000000), (120.000000, 127.400000)]
audio:
[(0.000000, 4.875873), (120.000725, 123.947868)]
This is what patch P2 fixes.
When seeking to 123s ; the keyframe is found at 120s exactly, we attempt then to seek at 120s which the audio range doesn't exactly contain that time, the seek algorithm returns the first keyframe time found before 120s which is 4.852653s

Problem #2:
- Load page
- Change resolution to 240p
- Seek to 1:40
- Click on the settings to change resolution, but do not change resolution just yet
- When playback reach exactly 1:52 change resolution to 240p. Timing is important
Result:
A/V sync is off by 2s
Attachment #8666605 - Attachment is obsolete: true
Attachment #8666605 - Flags: feedback?(gsquelart)
Attachment #8666639 - Flags: feedback?(padenot)
Attachment #8666639 - Flags: feedback?(cpearce)
There's something else going on...

If I return the original audio frame instead of the trimmed one; the A/V sync with the audio being about 2s late is still occurring ; it's like the AudioSink isn't flushed (we feed it with audio of the same timestamp than just prior the seek).

In any case, the issue is in the MDSM / AudioSink or whatever is actually playing the decoded audio, but not the media code.
Can't reproduce the issue in Aurora/Nightly ; this seems to be recent issue (trimming of audio frame is still wrong though)
Comment on attachment 8666639 [details] [diff] [review]
P3. Properly prune audio frame when seeking.

Review of attachment 8666639 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +2782,1 @@
>           frames * channels * sizeof(AudioDataValue));

Pointer arithmetic does that for you already, you're putting your pointer two or four times too far (depending on the type).

Also, please use PodCopy.
Attachment #8666639 - Flags: feedback?(padenot) → feedback-
Attachment #8666639 - Attachment is obsolete: true
Attachment #8666639 - Flags: review?(jwwang)
Attachment #8666639 - Flags: feedback?(cpearce)
Attached file avsync.log.zip
This is MediaFormatReader + MediaDecoder logging ; using steps mentioned earlier.
Comment on attachment 8666636 [details] [diff] [review]
[MSE] P2. Fix seek when target isn't found exactly in a buffered range.

Review of attachment 8666636 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with some nits, and maybe further analysis and/or debugging to ensure edge cases can't cause damage.

::: dom/media/mediasource/TrackBuffersManager.cpp
@@ +1890,5 @@
>    return GetTracksData(aTrack).mBuffers.LastElement();
>  }
>  
> +ssize_t TrackBuffersManager::FindSampleIndex(const TrackBuffer& aTrackBuffer,
> +                                             const media::TimeInterval& aInterval)

line 1894:
Remove 'media::' (to be consistent with Seek below).

@@ +1896,5 @@
> +  for (uint32_t i = 0; i < aTrackBuffer.Length(); i++) {
> +    const nsRefPtr<MediaRawData>& sample = aTrackBuffer[i];
> +    TimeUnit target = aInterval.mStart - aInterval.mFuzz;
> +    if (sample->mTime >= target.ToMicroseconds() ||
> +        sample->GetEndTime() > target.ToMicroseconds()) {

lines 1898-1900:
'target' definition and 'target.ToMicroseconds()' could be done once before the loop.

@@ +1897,5 @@
> +    const nsRefPtr<MediaRawData>& sample = aTrackBuffer[i];
> +    TimeUnit target = aInterval.mStart - aInterval.mFuzz;
> +    if (sample->mTime >= target.ToMicroseconds() ||
> +        sample->GetEndTime() > target.ToMicroseconds()) {
> +      MOZ_ASSERT(sample->mKeyframe, "First frame of group must be a keyframe");

line 1901:
That feels like a bold assertion (assuming that the caller will always give an interval where the first frame that touches or is past the interval is a keyframe). If it's really wanted it should probably be reflected in the method name.

@@ +1908,4 @@
>  TimeUnit
>  TrackBuffersManager::Seek(TrackInfo::TrackType aTrack,
> +                          const TimeUnit& aTime,
> +                          const media::TimeUnit& aFuzz)

line 1911:
Remove 'media::' (to be consistent with 'aTime').
Or add it everywhere if you prefer.

@@ +1912,4 @@
>  {
>    MOZ_ASSERT(OnTaskQueue());
>    auto& trackBuffer = GetTracksData(aTrack);
>    const TrackBuffersManager::TrackBuffer& track = GetTrackBuffer(aTrack);

lines 1914-1915:
'trackBuffer' comes from 'GetTracksData'
'track' comes from 'GetTrackBuffer'
It's a bit confusing!

@@ +1913,5 @@
>    MOZ_ASSERT(OnTaskQueue());
>    auto& trackBuffer = GetTracksData(aTrack);
>    const TrackBuffersManager::TrackBuffer& track = GetTrackBuffer(aTrack);
> +
> +  if (!track.Length()) {

line 1917:
if (track.IsEmpty())
(Though you've consistently never use IsEmpty, so maybe that ship has sailed.)

@@ +1931,5 @@
> +    if (index == TimeIntervals::NoIndex) {
> +      buffered.SetFuzz(aFuzz);
> +      index = buffered.Find(aTime);
> +    }
> +    MOZ_ASSERT(index != TimeIntervals::NoIndex);

line 1935:
Another bold assertion. Can you be sure aTime will always be found in/near buffered?

@@ +1935,5 @@
> +    MOZ_ASSERT(index != TimeIntervals::NoIndex);
> +    TimeInterval target = buffered[index];
> +
> +    int32_t start = FindSampleIndex(track, target);
> +    MOZ_ASSERT(start >= 0);

line 1938:
Now I see how the assertion at line 1901 should always be true, and also the method shouldn't return -1.

Still feels a bit like a house of cards; are you confident this whole sequence of find's will always end well, whatever the website&user throw at us?


I have a dream: It'd be great if there was a way to annotate variables/parameters, to indicate that we've somehow proven that they're in a friendly range, and help the compiler find potential issues that way.
Attachment #8666636 - Flags: review?(gsquelart) → review+
Comment on attachment 8666636 [details] [diff] [review]
[MSE] P2. Fix seek when target isn't found exactly in a buffered range.

looks like I uploaded the wrong patch... sorry for that
Attachment #8666636 - Attachment is obsolete: true
This will force an iterative search upon the next demux request.
Attachment #8667052 - Flags: review?(gsquelart)
Comment on attachment 8667051 [details] [diff] [review]
[MSE] P2. Fix seek when target isn't found exactly in a buffered range.

Review of attachment 8667051 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nit.

::: dom/media/mediasource/TrackBuffersManager.cpp
@@ +1934,5 @@
> +
> +    TimeInterval target = buffered[index];
> +    int32_t start = FindSampleIndex(track, target);
> +    MOZ_ASSERT(start >= 0);
> +    i = start;

line 1936:
Silent conversion from ssize_t to int32_t, probably ok in practice here, but it still feels dangerous. Why not just have FindSampleIndex return a uint32_t and using uint32_t(-1) as a not-found marker? (Which should never happen anyway, right?)

::: dom/media/mediasource/TrackBuffersManager.h
@@ +292,5 @@
>                      TrackData& aTrackData,
>                      uint32_t aStartIndex);
> +  // Find index of sample. Return a negative value if not found.
> +  ssize_t FindSampleIndex(const TrackBuffer& aTrackBuffer,
> +                          const media::TimeInterval& aInterval);

Remember to update the comment if you change ssize_t to uint32_t.
Attachment #8667051 - Flags: review?(gsquelart) → review+
Comment on attachment 8667052 [details] [diff] [review]
[MSE] P3. Reset cached demuxing index when new data overwrite the end.

Review of attachment 8667052 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with typo in patch comment: "overwrite" -> "overwrites".
Attachment #8667052 - Flags: review?(gsquelart) → review+
Assignee: nobody → jyavenard
Comment on attachment 8665820 [details] [diff] [review]
[MSE] P1. Ensure we stop on the closest keyframe when seeking.

Uplift approval is for all three patches.

Approval Request Comment
[Feature/regressing bug #]: 1171379
[User impact if declined]: Two problems fixed here: stalls after seek or video playing with either no audio or no video. 2) Bad A/V Sync issue, audio being several seconds behind video.
[Describe test coverage new/current, TreeHerder]: over 50 hours of manual testing, mochitest and webref tests. Now in central.
[Risks and why]: Low. We take a worse case approach so it can only work better.
[String/UUID change made/needed]: None
Attachment #8665820 - Flags: approval-mozilla-beta?
Attachment #8665820 - Flags: approval-mozilla-aurora?
Comment on attachment 8665820 [details] [diff] [review]
[MSE] P1. Ensure we stop on the closest keyframe when seeking.

Seems to be an important issue, taking it.
Attachment #8665820 - Flags: approval-mozilla-beta?
Attachment #8665820 - Flags: approval-mozilla-beta+
Attachment #8665820 - Flags: approval-mozilla-aurora?
Attachment #8665820 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: