Add FLAC (codec) support in MP4

RESOLVED FIXED

Status

()

Core
Audio/Video: Playback
P3
normal
Rank:
10
RESOLVED FIXED
a year ago
11 months ago

People

(Reporter: jya, Unassigned)

Tracking

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(platform-rel -)

Details

(Whiteboard: [platform-rel-BBC][platform-rel-BBCNews][platform-rel-Vimeo])

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

a year ago
Aim is to be able to have flac lossless audio stream embedded in the mp4 container.
(Reporter)

Updated

a year ago
Flags: needinfo?(tterribe)
Examples to crib from:

https://github.com/Netflix/vp9-dash/raw/master/Downloads/VPCodecISOMediaFileFormatBinding.pdf
https://www.opus-codec.org/docs/opus_in_isobmff.html
Flags: needinfo?(tterribe)

Comment 2

a year ago
(In reply to Timothy B. Terriberry (:derf) from comment #1)
> Examples to crib from:
> 
> https://github.com/Netflix/vp9-dash/raw/master/Downloads/
> VPCodecISOMediaFileFormatBinding.pdf
> https://www.opus-codec.org/docs/opus_in_isobmff.html

Are we in fact putting Opus in ISOBMFF, or is it just a superset example?
(In reply to cmontgomery from comment #2)
> Are we in fact putting Opus in ISOBMFF, or is it just a superset example?

Yes. Bug 1240413. I'm not sure what you mean by "superset example".

Comment 4

a year ago
(In reply to Timothy B. Terriberry (:derf) from comment #3)
> (In reply to cmontgomery from comment #2)
> > Are we in fact putting Opus in ISOBMFF, or is it just a superset example?
> 
> Yes. Bug 1240413. I'm not sure what you mean by "superset example".

I meant 'are we going for full ISOBMFF with all the associated structured metadata', or 'are we only wrapping things in enough MP4 for valid streaming'?  Having talked to Anthony at the all-hands, he seemed to think we only needed the streaming case.  

OTOH, if the BMFF metadata is mostly a spec cut-and-paste, then the spec should include it.

Comment 5

a year ago
Oh, I see, I'm just confused.  The only real 'metadata' there is explicitly extending/specifying the Movie Box.  I thought there was a lot more there.

Comment 6

a year ago
We're really looking forward to seeing this spec, and the subsequent implementation of course.

Please let me know in the first instance if there's anything we can do to help the specification process - we are happy to review etc.
Priority: -- → P2
Priority: P2 → P3

Comment 7

a year ago
Created attachment 8779826 [details]
Encapsulation of FLAC in ISO Base Media File Format.txt

Initial draft of FLAC in ISO BMFF

Comment 8

a year ago
Uploading a first draft now.  Although I'm reasonably certain that what it is trying to say is ~ correct, I'll admit this is my first time working in ISO's pseudocode structure markup, so I may have stumbled over it a bit.

Now is a good time for review and commentary.
Just a few comments, as I am by no means an ISO BMFF expert:

Sections 3.4 and 4.2 contain dangling references to native metadata. I think the explanations at the end of Section 4.3.2 are probably sufficient and you can just delete these.

The MetadataBlocks field in the FLACSpecificBox is redundant with the LastMetadataBlockFlag in the FLACMetadataBlock structure. It also artificially limits FLAC to 255 metadata blocks ("which should be enough for anybody"). I recognize that without it, you can't write the loop in FLACSpecificBox syntax with a fixed upper bound (and in practice the 255 limit is probably fine), but is there any other reason it's there? I'd expect at least that the redundancy be pointed out in the text so that the count is mandated to agree with the flags (the same way the channel count and sample depth are mandated to agree in all of the redundant places they're stored).

4.3.6.2: Is this section necessary? We had one for Opus because Opus does have pre-roll, but I suspect it's usually just not discussed for formats that lack the concept.

4.4.1: "may be present" ... did you mean optional?

Might be good to have an acknowledgments section somewhere crediting Yusuke?

Comment 10

a year ago
(In reply to Timothy B. Terriberry (:derf) from comment #9)
> Just a few comments, as I am by no means an ISO BMFF expert:
> 
> Sections 3.4 and 4.2 contain dangling references to native metadata. I think
> the explanations at the end of Section 4.3.2 are probably sufficient and you
> can just delete these.

probably a good idea.

> The MetadataBlocks field in the FLACSpecificBox is redundant with the
> LastMetadataBlockFlag in the FLACMetadataBlock structure. It also
> artificially limits FLAC to 255 metadata blocks ("which should be enough for
> anybody"). I recognize that without it, you can't write the loop in
> FLACSpecificBox syntax with a fixed upper bound (and in practice the 255
> limit is probably fine), but is there any other reason it's there? I'd
> expect at least that the redundancy be pointed out in the text so that the
> count is mandated to agree with the flags (the same way the channel count
> and sample depth are mandated to agree in all of the redundant places
> they're stored).

That pretty much summarizes the devil on my other shoulder.  I just wanted to avoid having to parse the list to get the length, nothing more.  I'd be happy to do it your way too.

> 4.3.6.2: Is this section necessary? We had one for Opus because Opus does
> have pre-roll, but I suspect it's usually just not discussed for formats
> that lack the concept.

OK.

> 4.4.1: "may be present" ... did you mean optional?

This is copying Yusuke's wording (and his diagram for the most part), but I did mean 'may or may not be present, depending on what's appropriate in context'. 
 
> Might be good to have an acknowledgments section somewhere crediting Yusuke?

Yes.  I just didn't want people contacting with questions or errata.
(Reporter)

Comment 11

a year ago
(In reply to cmontgomery from comment #10)
> That pretty much summarizes the devil on my other shoulder.  I just wanted
> to avoid having to parse the list to get the length, nothing more.  I'd be
> happy to do it your way too.

the MP4 box already has a size set, so we can deduct the size of all the metadata blocks but substracting 1 (the byte used for the version) from the box size right?

As far as the player is concerned, all it cares about (at least with FF implementation) is the STREAMINFO block, and maybe the vorbis comment. So once it has found that, it can directly jump to the end of the box.

Having said all that, why not make it all identical to the Ogg mapping?
https://xiph.org/flac/ogg_mapping.html

    The one-byte packet type 0x7F
    The four-byte ASCII signature "FLAC", i.e. 0x46, 0x4C, 0x41, 0x43
    A one-byte binary major version number for the mapping, e.g. 0x01 for mapping version 1.0
    A one-byte binary minor version number for the mapping, e.g. 0x00 for mapping version 1.0
    A two-byte, big-endian binary number signifying the number of header (non-audio) packets, not including this one. This number may be zero (0x0000) to signify 'unknown' but be aware that some decoders may not be able to handle such streams.
    The four-byte ASCII native FLAC signature "fLaC" according to the FLAC format specification
    The STREAMINFO metadata block for the stream.

The difference being is that the single FLACSpecificBox box contains all at once which would closely match the flac raw container once you skip the first 13 bytes.

Comment 12

a year ago
(In reply to Jean-Yves Avenard [:jya] from comment #11)
> (In reply to cmontgomery from comment #10)
> > That pretty much summarizes the devil on my other shoulder.  I just wanted
> > to avoid having to parse the list to get the length, nothing more.  I'd be
> > happy to do it your way too.
> 
> the MP4 box already has a size set, so we can deduct the size of all the
> metadata blocks but substracting 1 (the byte used for the version) from the
> box size right?

Yes.

> As far as the player is concerned, all it cares about (at least with FF
> implementation) is the STREAMINFO block, and maybe the vorbis comment. So
> once it has found that, it can directly jump to the end of the box.
> 
> Having said all that, why not make it all identical to the Ogg mapping?

I saw no reason to do so given that the more common native stream doesn't do it that way, nor does Matroska.  

I think I'll just drop the number of blocks field since it's purely redundant.

Monty

Comment 13

a year ago
Created attachment 8782097 [details]
Updated draft incorporating discussion and comments to date
Attachment #8779826 - Attachment is obsolete: true

Comment 14

a year ago
Apologies for the extreme delay in replying to this.

We've reviewed the updated draft and, on the whole, it looks really great. Just a few comments:

In 4.3.1, to make it clear the fields with a '+' bullet point are not extensions to the AudioSampleEntryBox, it would be good to see a comment saying something along the lines of "The fields from the AudioSampleEntryBox shall be set as follows:"

In 4.3.2, the FlacSpecificBox extends Box and has a version field. Normally if versioning is required we would expect FullBox to be the parent since versioning is then built into the box header, with flags set to 0.

Typos:
4.2 Bullet point 5 references a "FLAC packet". Packet does not seem to be defined in the FLAC spec. Should this be "FLAC frame"?
4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER
4.3.2 contianing -> containing
4.3.2 METADATA_BLOCKDATA -> METADATA_BLOCK_DATA
4.3.3 FLACDSampleEntry -> FLACSampleEntry
Depends on: 1303888
platform-rel: --- → ?
Whiteboard: [platform-rel-BBC][platform-rel-BBCNews]

Comment 15

a year ago
(In reply to David Evans from comment #14)
> Apologies for the extreme delay in replying to this.
> 
> We've reviewed the updated draft and, on the whole, it looks really great.
> Just a few comments:
> 
> In 4.3.1, to make it clear the fields with a '+' bullet point are not
> extensions to the AudioSampleEntryBox, it would be good to see a comment
> saying something along the lines of "The fields from the AudioSampleEntryBox
> shall be set as follows:"

OK, will add.

> In 4.3.2, the FlacSpecificBox extends Box and has a version field. Normally
> if versioning is required we would expect FullBox to be the parent since
> versioning is then built into the box header, with flags set to 0.

If that's the usual convention, we should do it that way.  I will update the spec.

> Typos:
> 4.2 Bullet point 5 references a "FLAC packet". Packet does not seem to be
> defined in the FLAC spec. Should this be "FLAC frame"?

Yes, I've stumbled a bit over some aspect of the two specs using different but overlapping terminology.  I'll clarify (and your assumption is correct).

> 4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER
> 4.3.2 contianing -> containing
> 4.3.2 METADATA_BLOCKDATA -> METADATA_BLOCK_DATA
> 4.3.3 FLACDSampleEntry -> FLACSampleEntry

Will got these too.

Comment 16

a year ago
> Will got these too.

How ironic.  Ignore the typo with respect to correcting typos.
platform-rel: ? → +

Comment 17

a year ago
By the way, I recommend to register the identifier at http://www.mp4ra.org/request.html.
Whiteboard: [platform-rel-BBC][platform-rel-BBCNews] → [platform-rel-BBC][platform-rel-BBCNews][platform-rel-Vimeo]
Created attachment 8797226 [details]
Draft v0.0.2

I took the liberty of updating the draft in response to comments.

- Removed implicit MetadataBlocks count
- Made FLACSpecificBox extend FullBox (BREAKING CHANGE)
- Added clarifying text to field descriptions.
- Fixed some typos, formatting, and wording.

In implementing this, I found the implicit MetadataBlock count a little strange. Almost everything in mp4 is length-prefixed, usually by wrapping it in a box so I wasn't able to use a fixed loop bound or our normal iterator. However, one does generally have to clip reads to the declared length of the parent boxes, so in practice it was fine, and there is precedent for 'read until the end without box headers' e.g. in AlternateStartupEntry ('alst') and ProgressiveDownloadInfoBox ('pdin'). The current design is probably worthwhile for keeping the native block structure intact for ease of remuxing.
Attachment #8782097 - Attachment is obsolete: true
Attachment #8797226 - Flags: feedback?(monty)
Some review comments from Paranoialmaniac, author of the Opus-in-mp4 spec:

> at a glance, it is unneeded to mention about the edit
> list. flac has no paddings and no requirements for
> removing it by the edit list, isn't it? i think all terms
> and definitions there are unneeded

To clarify, flac is a block-based codec, but there are two internal mechanisms which can reproduce a fixed duration.

(1) Block sizes are variable. Although there is a minimum size of 16 samples, the last two blocks can be adjusted to set any arbitrary length over that. https://xiph.org/flac/format.html#blocking

(2) The STREAMINFO header can optionally indicate the total number of samples (up to 36 bits, or 4 days at 192 kHz).

I suspect, but have not verified, that the reference implementation uses fixed block lengths and sets a total sample number to trim the output. I think removing the edit list definition is fine; it and the 'active track' are dangling references. But we should add text to the encapsulation section to clarify that the decoder will trim the input frames to the correct duration. No offset or padding is necessary and and edit list is purely for post-encode trimming.

> samplerate field, which is 32bit field, in
> AudioSampleEntry is left-shifted by 16bits. so it can
> express up to 65535Hz

This is a pretty serious issue given the popularity of 96 and 192 kHz flac files. ISO mp4 doesn't traditionally support sample rates over 65kHz. There is an AudioSampleRateV1 extension defined in 14496-12:2015 (or 2012/Amd.2:2014) but it is not widely supported. For compatibility, it is apparently common to instead put the largest expressible multiple of the sample rate in this field and then override with codec-specific data. There is precedent for this.

> The samplerate, samplesize and channelcount fields document the default
> audio output playback format for this media. The timescale for an audio
> track should be chosen to match the sampling rate, or be an integer
> multiple of it, to enable sample-accurate timing.
> [...]
> The audio output format (samplerate, samplesize and channelcount fields)
> in the sample entry should be considered definitive only for codecs that
> do not record their own output configuration. If the audio codec has
> definitive information about the output format, it shall be taken as
> definitive; in this case the samplerate, samplesize and channelcount
> fields in the sample entry may be ignored, though sensible values should
> be chosen (for example, the highest possible sampling rate).
> [ISO 14496-12:2012, sec 8.5.2.1, pages 29, 30.]

In the AudioSampleEntryV1 ammendment, the sample-accurate timing choice won out. Paranoialmaniac advises:

> it is complex. the old isobmff spec says samplerate = mdhd.timescale << 16
> the latest has different defintion. see 14496-12:2015 12.2.3.1 Definition
> anyway, for high sampling rates, you should fix the draft. 
> for almost all codecs, samplerate field is useless. if you
> want indicate actual samlerate within the spec of isobmff,
> use AudioSampleEntryV1 and store the actual value in
> sampling rate box 
> (i don't recommend use of AudioSampleEntryV1 for compatibilities)
> [...]
> it's really difficult question :D but 14496-12:2015 says
> "the samplerate field in the sample entry should contain a
> value left‐shifted 16 bits (as for AudioSampleEntry) that
> matches the media timescale, or be an integer division or
> multiple of it.". so, i prefer the maximum value of
> available divisions of the high sampling rate. e.g. for
> 96000Hz, set samplerate to 48000
> and let FLACSpecificBox overrides it
Created attachment 8797684 [details]
Draft v0.0.3

Update draft in response to Paranoialmaniac's comments.

- Removed 'Terms and Definitions' section.
  I decided in the end it was clear enough without mentioning edit lists.

- Describe how to handle high-rate files. I.e. put 48kHz for 96 and 192 kHz 
  bitstreams and read the FLAC Specific Box for the override.

- Removed `aligned(8)` declaration on FLACMetadataBlock.
  While some flac metadata blocks are multiples of 8 bytes in length, many,
  including STREAMINFO VORBIS_COMMENT and PICTURE are not, so unless we add
  some machanism for padding, blocks after the first can end with only byte
  alignment.
Attachment #8797226 - Attachment is obsolete: true
Attachment #8797226 - Flags: feedback?(monty)
Attachment #8797684 - Flags: feedback?(monty)
I've landed copies of the drafts so far in the upstream flac repo for easier reference.

https://git.xiph.org/?p=flac.git;a=blob;f=doc/isoflac.txt
(In reply to cmontgomery from comment #15)
> (In reply to David Evans from comment #14)
> > 4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER

This typo still appears in three places.
Experimental muxer: https://github.com/kinetiknz/FFmpeg/tree/flac_mp4
Created attachment 8805161 [details]
Draft v0.0.4

Fix typos reported in #22.
Attachment #8797684 - Attachment is obsolete: true
Attachment #8797684 - Flags: feedback?(monty)
Rank: 10
Ralph - what do we need to do before we close this bug?
Flags: needinfo?(giles)
platform-rel: + → -
I think we can consider this resolved. We have an implementation in Firefox with interoperates with the other implementations I'm aware of. We could pursue more formal standardization, but I don't personally want to tackle that right now. Further development of Monty's spec should happen in the upstream flac repository. https://git.xiph.org/flac.git or https://github.com/xiph/flac
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Flags: needinfo?(giles)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.