1400254 - Losing Audio sync with video by using timestampOffset in MSE SourceBuffer

Reporter

Description

•

7 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0 Build ID: 20170824053622 Steps to reproduce: Precondition SampleRate=48000Hz, BitRate=192000, BitsPerSample=16 * add an audio part to SourceBuffer timeline with e.g. timestampOffset=-293.72, appendWindowStart=0, appendWindowEnd=5.36 * add the same part to timeline with timestampOffset=-288.36 (audio is placed after the previous), appendWindowStart=5.36, appendWindowEnd=10.72 * Example created: http://www.digamailboxdepot.net:8090/mse-audio-splice.html Actual results: A little gap is between the audio parts, which leads to loose the synchronisation with video with mutliple clips (more than 2 as in STR) Expected results: No gap shall be there as in Chrome Browser.

mkleinschmidt

Reporter

Comment 1

•

7 years ago

I forgot to tell, that the gap is visualized in the example http://www.digamailboxdepot.net:8090/mse-audio-splice.html in the "Buffer:" line. In FF there are 2 parts, in Chrome there is only one part.

Kohei Yoshino

Updated

•

7 years ago

Component: Untriaged → Audio/Video: Playback

Product: Firefox → Core

Jean-Yves Avenard [:jya]

Assignee

Comment 2

•

7 years ago

Chrome implementation of the buffered attribute is incorrect, as well as their timestampOffset.. So how it behaves and comparing to Firefox is meaningless.

Jean-Yves Avenard [:jya]

Assignee

Updated

•

7 years ago

Updated

•

7 years ago

Assignee: nobody → jyavenard

Jean-Yves Avenard [:jya]

Assignee

Comment 3

•

7 years ago

Some analysis: Caveat: the use of appendWindowStart and appendWindowEnd do not allow to add partial frames to the source buffer. See https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing if a frame start time is before appendWindowStart or if the frame end time is after appendWindowEnd, the entire frame is dropped. (see https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing step 8 and 9) There is a provision to perform audio splices, but this is a "MAY support" and isn't a normative requirement. Step by steps run: first appendBuffer criterias are: timestampOffset = -293.72s appendWindowStart = 0s appendWindowEnd = 5.36s One thing to keep in mind, is that samples use integers for their timestamp in a mp4, while timestampOffset is a floating point number. -293.72 has no exact representation in integers microseconds, the closest value is -293719999us. So this is the value we will use in the calculations below (this rounding has no effect in the actual calculations, even if we had used floating point the end results would have been the same). The use of integers isn't prohibited by the spec: "Implementations don't have to internally store timestamps in a double precision floating point representation. This representation is used here because it is the represention for timestamps in the HTML spec." So the first samples added to the source buffer will be: (times are in microseconds) ProcessFrames: Processing audio/mp4a-latm frame(pts:293738666 end:293760000, dts:293738666, duration:21334, kf:1) as: 293738666 - 293719999 = 18667us which is >= appendWindowStart and < appendWindowEnd The previous sample in the stream had a pts of 293717333us: 293717333 - 293719999 = -2666us, as this is less than append window start, the sample is discarded "If presentation timestamp is less than appendWindowStart, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame. " The last sample added to the interval: frame(pts:299050666 end:299072000, dts:299050666, duration:21334, kf:1) 299050666 - 293719999 = 5330667 and 299072000 = 5352001 the next sample times are: frame(pts:299072000 end:299093333, dts:299072000, duration:21333, kf:1) 299072000 - 293719999us = 5352001 and 299093333 - 293719999us = 5373334 while 5352001 is >= appendWindowStart , we have the end time > appendWindowEnd per spec: "If frame end timestamp is greater than appendWindowEnd, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame. " so the frame is dropped. As such, the first operation gives us a buffered range of [0.018667, 5.352001] The second appendBuffer uses the following criteria: timestampOffset is -288.36s (rounded to -288359999us) appendWindowStart = 5.36s appendWindowEnd = 10.72s The first frame added with those criterias is: frame(pts:293738666 end:293760000, dts:293738666, duration:21334, kf:1) as 293738666 - 288359999us = 5378667us and 293760000 - 288359999 = 5400001us 5378667us is >= appendWindowEnd and 5400001us is < appendWindowEnd the previous frame being: frame(pts:293717333 end:293738666, dts:293717333, duration:21333, kf:1) 293717333 - 288359999us = 5357334us and 293738666 - 288359999us = 5378667us 5357334us is less than appendWindowEnd so the frame is dropped "If presentation timestamp is less than appendWindowStart, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame. " So this explains why we have a buffered range with a gap from 5.352001s and 5.378667s Consequently the last frame added to the source buffer is: frame(pts:299050666 end:299072000, dts:299050666, duration:21334, kf:1) 299050666 - 288359999 = 10690667 and 299072000 - 288359999 = 10712001 the next frame being: frame(pts:299072000 end:299093333, dts:299072000, duration:21333, kf:1) 299072000 - 288359999 = 10712001 and 299093333 - 288359999 = 10733334us 10733334us is > 10.72s, and so the frame is dropped. And so our final buffered range is: [0.018667, 5.352001), (5.378667, 10.712001) Exactly as per spec. So the issue here is that no partial frames will be added to a source buffer when using appendWindowStart and appendWindowEnd. *However*, you can still achieve what you want to do, because while no partial frame is ever added, a partial frame is never removed either, even when adding frames that overlaps (https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing step 13) We only need to ensure that a frame in the interval [5.352001, 5.378667] isn't removed. This can be done by setting appendWindowEnd to 5.378667 in the first place. So this bug should be closed as invalid. the behaviour is exactly as per spec. I will make an amendment to our code however, we have a workaround to allow frames *near* appendWindowStart, and so here we would allow to add to the source buffer the sample with a start time of -2666us.. I will remove this workaround if appendWindowEnd is set to a different value than the default. Also note that the gap should have no impact on A/V sync. The timestamps are still valid.

Comment hidden (mozreview-request)

Jean-Yves Avenard [:jya]

Assignee

Comment 6

•

7 years ago

example with timestamps fix: https://jyavenard.github.io/htmltests/tests/mse_mp4/1400254.html

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Updated

•

7 years ago

Summary: Loosing Audio sync with video by using timestampOffset in MSE SourceBuffer → Losing Audio sync with video by using timestampOffset in MSE SourceBuffer

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 7

•

7 years ago

mozreview-review

Comment on attachment 8909106 [details] Bug 1400254 - Don't apply fuzz workaround when appendWindowEnd is set. https://reviewboard.mozilla.org/r/180700/#review185774

Attachment #8909106 - Flags: review?(gsquelart) → review+

Pulsebot

Comment 8

•

7 years ago

Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/601946972c44 Don't apply fuzz workaround when appendWindowEnd is set. r=gerald

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 9

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/601946972c44

Status: UNCONFIRMED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

mkleinschmidt

Reporter

Comment 10

•

7 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #3) > Some analysis: > > Caveat: the use of appendWindowStart and appendWindowEnd do not allow to add > partial frames to the source buffer. > See https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing > if a frame start time is before appendWindowStart or if the frame end time > is after appendWindowEnd, the entire frame is dropped. > (see https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing > step 8 and 9) > > There is a provision to perform audio splices, but this is a "MAY support" > and isn't a normative requirement. > > > Step by steps run: > first appendBuffer criterias are: > timestampOffset = -293.72s > appendWindowStart = 0s > appendWindowEnd = 5.36s > > One thing to keep in mind, is that samples use integers for their timestamp > in a mp4, while timestampOffset is a floating point number. -293.72 has no > exact representation in integers microseconds, the closest value is > -293719999us. So this is the value we will use in the calculations below > (this rounding has no effect in the actual calculations, even if we had used > floating point the end results would have been the same). > The use of integers isn't prohibited by the spec: > "Implementations don't have to internally store timestamps in a double > precision floating point representation. This representation is used here > because it is the represention for timestamps in the HTML spec." > > > So the first samples added to the source buffer will be: (times are in > microseconds) > ProcessFrames: Processing audio/mp4a-latm frame(pts:293738666 end:293760000, > dts:293738666, duration:21334, kf:1) > > as: 293738666 - 293719999 = 18667us which is >= appendWindowStart and < > appendWindowEnd > > The previous sample in the stream had a pts of 293717333us: > 293717333 - 293719999 = -2666us, as this is less than append window start, > the sample is discarded > "If presentation timestamp is less than appendWindowStart, then set the need > random access point flag to true, drop the coded frame, and jump to the top > of the loop to start processing the next coded frame. " > > The last sample added to the interval: > frame(pts:299050666 end:299072000, dts:299050666, duration:21334, kf:1) > 299050666 - 293719999 = 5330667 and 299072000 = 5352001 > > the next sample times are: > frame(pts:299072000 end:299093333, dts:299072000, duration:21333, kf:1) > 299072000 - 293719999us = 5352001 and 299093333 - 293719999us = 5373334 > while 5352001 is >= appendWindowStart , we have the end time > > appendWindowEnd > per spec: > "If frame end timestamp is greater than appendWindowEnd, then set the need > random access point flag to true, drop the coded frame, and jump to the top > of the loop to start processing the next coded frame. " > > so the frame is dropped. > > As such, the first operation gives us a buffered range of [0.018667, > 5.352001] > > The second appendBuffer uses the following criteria: > timestampOffset is -288.36s (rounded to -288359999us) > appendWindowStart = 5.36s > appendWindowEnd = 10.72s > > The first frame added with those criterias is: > frame(pts:293738666 end:293760000, dts:293738666, duration:21334, kf:1) > as > 293738666 - 288359999us = 5378667us and 293760000 - 288359999 = 5400001us > 5378667us is >= appendWindowEnd and 5400001us is < appendWindowEnd > > the previous frame being: > frame(pts:293717333 end:293738666, dts:293717333, duration:21333, kf:1) > 293717333 - 288359999us = 5357334us and 293738666 - 288359999us = 5378667us > 5357334us is less than appendWindowEnd so the frame is dropped > "If presentation timestamp is less than appendWindowStart, then set the need > random access point flag to true, drop the coded frame, and jump to the top > of the loop to start processing the next coded frame. " > > So this explains why we have a buffered range with a gap from 5.352001s and > 5.378667s > > Consequently the last frame added to the source buffer is: > frame(pts:299050666 end:299072000, dts:299050666, duration:21334, kf:1) > 299050666 - 288359999 = 10690667 and 299072000 - 288359999 = 10712001 > > the next frame being: > frame(pts:299072000 end:299093333, dts:299072000, duration:21333, kf:1) > 299072000 - 288359999 = 10712001 and 299093333 - 288359999 = 10733334us > 10733334us is > 10.72s, and so the frame is dropped. > > And so our final buffered range is: > [0.018667, 5.352001), (5.378667, 10.712001) > Exactly as per spec. > > So the issue here is that no partial frames will be added to a source buffer > when using appendWindowStart and appendWindowEnd. > > *However*, you can still achieve what you want to do, because while no > partial frame is ever added, a partial frame is never removed either, even > when adding frames that overlaps > (https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing > step 13) > We only need to ensure that a frame in the interval [5.352001, 5.378667] > isn't removed. > > This can be done by setting appendWindowEnd to 5.378667 in the first place. > > So this bug should be closed as invalid. the behaviour is exactly as per > spec. > > I will make an amendment to our code however, we have a workaround to allow > frames *near* appendWindowStart, and so here we would allow to add to the > source buffer the sample with a start time of -2666us.. > I will remove this workaround if appendWindowEnd is set to a different value > than the default. > > Also note that the gap should have no impact on A/V sync. The timestamps are > still valid. Thanks very much for the explanation! What I also noticed from you is that appendWindowEnd value has to be larger to get the expected audio (>= appendWindowStart, < appendWindowEnd).

Bugzilla

Losing Audio sync with video by using timestampOffset in MSE SourceBuffer

Categories

(Core :: Audio/Video: Playback, defect)

Tracking

()

People

(Reporter: mkleinschmidt, Assigned: jya)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type