Not only that, but we also don't trim correctly the end. https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html has lots of information on what to do.
In Gecko, with the crate mp4parse, the number of frames to skip at the beginning is correctly surfaced in the
media_time. This is the encoder delay.
The number of padding frames can be computed in different ways:
We can use the info in the
stts box, or we can use the media duration
It has two entries. A packet count (that we need to multiply by the size of the packet, which is always 1024), and then a number of "valid" frames in the last packet. Taking https://github.com/kunstmusik/decodeAudioDataTest/blob/main/test441.m4a as an example, opening it in MP4 Explorer, we have:
- 1938 packets of 1024 frames = 1984512 frames
- 1 packet of 1012 frames = 1012 frames
- an encoder delay in the edit list of 1024 frames
1938 * 1024 + 1 * 1012 - 1024 = 1984500 frames
1984500 frames / 44100. Hz = 45.0 seconds
which is the correct duration and frame count.
It's unclear to me for now where this
stts box is surfaced.
The media duration
It is also in the edit list as
Segment duration, and in the same file its 45000 (milliseconds).
45000ms / 1e3 * 44100 = 1984500 frames
which is the correct frame count.
We have the encoder delay in the edit list, so we skip those frames, and then we output frames until 1984500 frames have been output by the decoder, and we discard the rest.
Unfortunately we don't use this
Segment duration in the edit list, it doesn't look surfaced from the mp4 parser. The
duration member alongside the encoder
media_start is from the Media Header Box, and not from the edit list, so it's the incorrect time.
What's is the preferable solution? Using the
stts box look safer, but how is it exposed? It seems exposed only in terms of packet byte ranges or something?