Closed Bug 1241057 Opened 8 years ago Closed 8 years ago

Abnormal timestamps of sine.webm

Categories

(Core :: Audio/Video: Playback, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jwwang, Unassigned)

References

Details

After bug 1239223 is fixed, we have:

D/MediaSample Decoder=7fcc58145b70 OnAudioDecoded [418666,439999] disc=0
D/MediaSample Decoder=7fcc58145b70 OnAudioDecoded [440000,461333] disc=0
D/MediaSample Decoder=7fcc58145b70 OnAudioDecoded [461333,482666] disc=0
D/MediaSample Decoder=7fcc58145b70 OnAudioDecoded [482656,503989] disc=0

The start time of sample 4 is smaller than the end time of sample 3 (482656 < 482666) which looks like a rounding error to me.
Depends on: 1239223
Flags: needinfo?(jyavenard)
what's the question?
Flags: needinfo?(jyavenard)
The timestamp and duration of samples is identical to what the WebMReader used to return (I made sure of that)

It also matches what comes out of ffprobe
Shouldn't the start time (482656) be greater than the end time (482666) of the previous sample?
this is what's encoded inside the webm.

With bug 1240201, frames time and duration found within a block are now properly decoded (there are 8 frames per webm SimpleBlock)

first block start at 0s
2nd start at 0.141324s
3rd start at 0.312s
4th start at 0.482656s

it's vorbis, and frames have individually set duration. The start of the first frame of the webm block is of the time set for that block.

3rd block has 8 frames, each with a duration of 21333us ; so total = 21333 * 8 = 170664 + 312000 = so end time really is 482664 (note bug 1240201 does remove a rounding issue, hence why it's not the 482666)

But the 4th block start in the webm container is set at 482656. This value comes straight from nestegg (actualy value was 482656608ns)

Nothing appears wrong with our code (either demuxer or decoder), but how this webm was encoded.

Mathew, is there anything I miss ?
Flags: needinfo?(kinetik)
There are rounding errors in the timecodes stored in the WebM (and any format that doesn't deal with this issue directly).  The size of the error depends on the container and choices made during muxing.  There are also potentially rounding errors converting between {nano,micro,milli}seconds and a sample count.

In this case, we've got a Vorbis sample rate of 48000 Hz. The muxer has chosen a timecode scale of 20832, and note 1e9/48000 = 20833.3333(...) nanoseconds/sample is pretty close.

The timecodes in the file are as follows:

Muxed TC   Scaled TC
       0           0
    6784   141324288
   14977   312000864
   23169   482656608

Altering the muxed TC by -/+ 1 would result in scaled TCs of 482635776 or 482677440.  None of which are exactly the 482664xxx value that the duration of the decoded samples in the previous block end up at.
Flags: needinfo?(kinetik)
so should we close this bug as invalid ?

or should we ignore the timecode in the container (other than the first), and only rely on the samples duration? (except when seeking)?
(In reply to Jean-Yves Avenard [:jya] from comment #6)
> so should we close this bug as invalid ?

Is it causing a real problem?  The fact that there are various rounding errors in the media stack isn't really handled in a holistic way or represented anywhere... I'm not sure what to do about it.

> or should we ignore the timecode in the container (other than the first),
> and only rely on the samples duration? (except when seeking)?

It's not clear that'd be safe for A/V sync.  There is code to insert audio silence when there are gaps in the timestamps.
I don't think it's causing problem. File certainly play okay for me.

JW, closing as invalid.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Matthew Gregan [:kinetik] from comment #7)
> It's not clear that'd be safe for A/V sync.  There is code to insert audio
> silence when there are gaps in the timestamps.

But we don't drop audio frames with staled timestamps. Consider the following audio sample:
[0,100]
[100,200]
[150,250]
[200,300]

The final audio time would be 400 instead of 300 and cause A/V sync issues. Dropping staled frames is only possible when the timestamps is reliable.
(In reply to JW Wang [:jwwang] from comment #9)
> (In reply to Matthew Gregan [:kinetik] from comment #7)
> > It's not clear that'd be safe for A/V sync.  There is code to insert audio
> > silence when there are gaps in the timestamps.

Here I mean when there's a gap, e.g.

[0, 100]
[101, 200]

...we insert silence between 100 and 101.  This was once necessary to make A/V sync correct for some files, where (I guess) the muxer chose to introduce timecode gaps instead of allowing the timecodes to overlap slightly.

I guess you could argue that if we played the audio back-to-back without inserting silence it would be correct, but then we'd have to change the video frame timecodes to cope.

> But we don't drop audio frames with staled timestamps. Consider the
> following audio sample:
> [0,100]
> [100,200]
> [150,250]
> [200,300]
> 
> The final audio time would be 400 instead of 300 and cause A/V sync issues.
> Dropping staled frames is only possible when the timestamps is reliable.

I'm not sure I follow, I don't recall dropping audio for sync purposes - we drive video sync off of audio.  If we did, this might be an issue (but probably we would resample to adjust drift rather than hard drop, dunno).  I don't know what the ultimate solution is here - I'm only explaining why the timecodes from the file are the way they are.

Yeah, it's wrong, I don't know what the right fix is without spending significant time investigating... and without a concrete problem this is causing, it's probably not going to end up on the top of anyone's list.

Maybe wontfix (for now) is more appropriate than invalid.
Resolution: FIXED → WONTFIX
Wontfix is fine to me.

IIRC, jya used to have some A/V sync issues where video is ahead by about 1 sec after seeking. If it is caused by abnormal timestamps of audio frames illustrated above, we will then have to consider again how to mitigate redundancy or gaps in audio frames.
The fixing of bug 1240201 (where we also properly calculate the timestamp and duration when just demuxing) caused some failure for exactly that reason with the webm/vorbis samples.

the time between two webm blocks do not match the sum of the samples in those blocks.
So this cause the buffered range reported to no be continuous:
example:
expected "{ [0.000, 6.050) }" but got "{ [0.013, 0.384) [0.408, 0.779) [0.803, 2.381) [2.405, 3.565) [3.589, 3.960) [3.984, 4.378) [4.402, 5.562) [5.586, 5.957) [5.981, 6.050) }

0.013 because I went by the advice of the vorbis author and considered that the first vorbis block wouldn't produce any audio until the second was seen, and from then the number of samples returned by the decoder would be previous_block/4 + next_block/4

So i will have to handle things differently. For a start to prevent the problem mentioned above where we could have a rounding error between the sum of the duration samples and the time found in the container, I will make it so that the duration of the last sample in the block is always aligned to end where the new block start.

I will also not attempt to determine the duration of a sample if there's only one in a the webm block (as in effect we only care that two samples don't have the same start time and duration).
See Also: → 1240201
You need to log in before you can comment on or make changes to this bug.