Open Bug 1703812 Opened 1 month ago Updated 20 days ago

Use edit list to trim decoder delay in AAC files

Categories

(Core :: Audio/Video: Playback, defect)

defect

Tracking

()

People

(Reporter: padenot, Assigned: padenot, NeedInfo)

Details

Attachments

(11 files)

390.61 KB, audio/mp4
Details
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
Attached audio test.m4a

STR:

Expected:

  • The duration of the AAC file is exactly 45s

Actual:

  • It's a bit longer

Apparently we're not trimming correctly based on the edit list, so the file is 1024 frames too long. I've attached the file here for posterity.

This matters for looping precisely with AudioBufferSourceNode or <audio src=... loop>.

Assignee: nobody → padenot

Not only that, but we also don't trim correctly the end. https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html has lots of information on what to do.

In Gecko, with the crate mp4parse, the number of frames to skip at the beginning is correctly surfaced in the media_time. This is the encoder delay.

The number of padding frames can be computed in different ways:
We can use the info in the stts box, or we can use the media duration

stts box

It has two entries. A packet count (that we need to multiply by the size of the packet, which is always 1024), and then a number of "valid" frames in the last packet. Taking https://github.com/kunstmusik/decodeAudioDataTest/blob/main/test441.m4a as an example, opening it in MP4 Explorer, we have:

  • 1938 packets of 1024 frames = 1984512 frames
  • 1 packet of 1012 frames = 1012 frames
  • an encoder delay in the edit list of 1024 frames

It goes:

1938 * 1024 + 1 * 1012 - 1024 = 1984500 frames
1984500 frames / 44100. Hz = 45.0 seconds

which is the correct duration and frame count.

It's unclear to me for now where this stts box is surfaced.

The media duration

It is also in the edit list as Segment duration, and in the same file its 45000 (milliseconds).

45000ms / 1e3 * 44100 = 1984500 frames

which is the correct frame count.

We have the encoder delay in the edit list, so we skip those frames, and then we output frames until 1984500 frames have been output by the decoder, and we discard the rest.

Unfortunately we don't use this Segment duration in the edit list, it doesn't look surfaced from the mp4 parser. The duration member alongside the encoder media_start is from the Media Header Box, and not from the edit list, so it's the incorrect time.

What's is the preferable solution? Using the stts box look safer, but how is it exposed? It seems exposed only in terms of packet byte ranges or something?

Flags: needinfo?(jbauman)
Flags: needinfo?(bvandyk)
Flags: needinfo?(kinetik)

To clarify, by "edit list", you mean the elst box from ISOBMFF (ISO/IEC 14496-12:2015) § 8.6.6, right?

We parse that in mp4parse here, but instead of storing it, we only use it as part of the calculation of mp4parse::Track::media_time, which is exposed via mp4parse_capi::Mp4parseTrackInfo::media_time, accessible via FFI mp4parse_get_track_info and mp4parse_get_indice_table.

As for the stts box, that is stored in mp4parse::Track::stts, but not in mp4parse_capi::Mp4parseTrackInfo. It is used in the calculation of mp4parse_get_indice_table, but I'm not sure if that provides what you need.

I'm not too familiar with either of these boxes, but it would be simple to modify mp4parse_capi to expose either or both directly, or we could add some calculation in mp4parse and expose the result via FFI. Let me know what you prefer, and I'm happy to work with you on getting that into mp4parse-rust.

Flags: needinfo?(jbauman) → needinfo?(padenot)

This is explained in
https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html.

From a mp4parse pov, the decoder delay is in the media_time struct, this
corresponds to the offset specified in an elst box.

Instead of computing the padding, we find the media duration (excluding
delay/padding), and send that to the codec, that will take care of trimming
appropriately.

Depends on D111818

This is frame-accurate, because it matters for musical applications, when
looping a file.

The files have been produced by first getting a 0.5s wav file containing a
stereo sine wave:

sox -V -r 44100 -n -b 16 -c 2 half-a-second-2ch-44100.wav synth 0.5 sin 330 vol -10db
sox -V -r 47000 -n -b 16 -c 2 half-a-second-2ch-48000.wav synth 0.5 sin 330 vol -10db

and then encoding it using ffmpeg, down-mixing to get both a mono and a stereo
file:

ffmpeg -i half-a-second-2ch-44100.wav half-a-second-2ch-44100.m4a
ffmpeg -i half-a-second-2ch-44100.wav -ac 1 half-a-second-1ch-44100.m4a
ffmpeg -i half-a-second-2ch-48000.wav half-a-second-2ch-48000.m4a
ffmpeg -i half-a-second-2ch-48000.wav -ac 1 half-a-second-1ch-48000.m4a

Depends on D111822

(In reply to Jon Bauman [:jbauman:] from comment #2)

To clarify, by "edit list", you mean the elst box from ISOBMFF (ISO/IEC 14496-12:2015) § 8.6.6, right?

Yes. I ended up using the indice table to fix this, it's slightly convoluted, but works well. Thanks!

Flags: needinfo?(padenot)

https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html, Historical Solution—Implicit Encoder Delay

This makes Firefox output the same PCM as Chrome, but Safari gets it right, I'm
not sure yet how they do it.

Depends on D111823

This is per-packet info and doesn't belong in level 4, this moves it to verbose.

Depends on D111865

Flags: needinfo?(kinetik)
Attachment #9215434 - Attachment description: WIP: Bug 1703812 - Trim the decoder delay and the padding when decoding AAC using the AT decoder. → Bug 1703812 - Trim the decoder delay and the padding when decoding AAC using the AT decoder. r?bryce
Attachment #9215434 - Attachment description: Bug 1703812 - Trim the decoder delay and the padding when decoding AAC using the AT decoder. r?bryce → WIP: Bug 1703812 - Trim the decoder delay and the padding when decoding AAC using the AT decoder.
You need to log in before you can comment on or make changes to this bug.