Closed Bug 1286097 Opened 8 years ago Closed 8 years ago

Add FLAC (codec) support in MP4

Categories

(Core :: Audio/Video: Playback, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
platform-rel --- -

People

(Reporter: jya, Unassigned)

References

Details

(Whiteboard: [platform-rel-BBC][platform-rel-BBCNews][platform-rel-Vimeo])

Attachments

(1 file, 4 obsolete files)

Aim is to be able to have flac lossless audio stream embedded in the mp4 container.
Flags: needinfo?(tterribe)
(In reply to Timothy B. Terriberry (:derf) from comment #1) > Examples to crib from: > > https://github.com/Netflix/vp9-dash/raw/master/Downloads/ > VPCodecISOMediaFileFormatBinding.pdf > https://www.opus-codec.org/docs/opus_in_isobmff.html Are we in fact putting Opus in ISOBMFF, or is it just a superset example?
(In reply to cmontgomery from comment #2) > Are we in fact putting Opus in ISOBMFF, or is it just a superset example? Yes. Bug 1240413. I'm not sure what you mean by "superset example".
(In reply to Timothy B. Terriberry (:derf) from comment #3) > (In reply to cmontgomery from comment #2) > > Are we in fact putting Opus in ISOBMFF, or is it just a superset example? > > Yes. Bug 1240413. I'm not sure what you mean by "superset example". I meant 'are we going for full ISOBMFF with all the associated structured metadata', or 'are we only wrapping things in enough MP4 for valid streaming'? Having talked to Anthony at the all-hands, he seemed to think we only needed the streaming case. OTOH, if the BMFF metadata is mostly a spec cut-and-paste, then the spec should include it.
Oh, I see, I'm just confused. The only real 'metadata' there is explicitly extending/specifying the Movie Box. I thought there was a lot more there.
We're really looking forward to seeing this spec, and the subsequent implementation of course. Please let me know in the first instance if there's anything we can do to help the specification process - we are happy to review etc.
Initial draft of FLAC in ISO BMFF
Uploading a first draft now. Although I'm reasonably certain that what it is trying to say is ~ correct, I'll admit this is my first time working in ISO's pseudocode structure markup, so I may have stumbled over it a bit. Now is a good time for review and commentary.
Just a few comments, as I am by no means an ISO BMFF expert: Sections 3.4 and 4.2 contain dangling references to native metadata. I think the explanations at the end of Section 4.3.2 are probably sufficient and you can just delete these. The MetadataBlocks field in the FLACSpecificBox is redundant with the LastMetadataBlockFlag in the FLACMetadataBlock structure. It also artificially limits FLAC to 255 metadata blocks ("which should be enough for anybody"). I recognize that without it, you can't write the loop in FLACSpecificBox syntax with a fixed upper bound (and in practice the 255 limit is probably fine), but is there any other reason it's there? I'd expect at least that the redundancy be pointed out in the text so that the count is mandated to agree with the flags (the same way the channel count and sample depth are mandated to agree in all of the redundant places they're stored). 4.3.6.2: Is this section necessary? We had one for Opus because Opus does have pre-roll, but I suspect it's usually just not discussed for formats that lack the concept. 4.4.1: "may be present" ... did you mean optional? Might be good to have an acknowledgments section somewhere crediting Yusuke?
(In reply to Timothy B. Terriberry (:derf) from comment #9) > Just a few comments, as I am by no means an ISO BMFF expert: > > Sections 3.4 and 4.2 contain dangling references to native metadata. I think > the explanations at the end of Section 4.3.2 are probably sufficient and you > can just delete these. probably a good idea. > The MetadataBlocks field in the FLACSpecificBox is redundant with the > LastMetadataBlockFlag in the FLACMetadataBlock structure. It also > artificially limits FLAC to 255 metadata blocks ("which should be enough for > anybody"). I recognize that without it, you can't write the loop in > FLACSpecificBox syntax with a fixed upper bound (and in practice the 255 > limit is probably fine), but is there any other reason it's there? I'd > expect at least that the redundancy be pointed out in the text so that the > count is mandated to agree with the flags (the same way the channel count > and sample depth are mandated to agree in all of the redundant places > they're stored). That pretty much summarizes the devil on my other shoulder. I just wanted to avoid having to parse the list to get the length, nothing more. I'd be happy to do it your way too. > 4.3.6.2: Is this section necessary? We had one for Opus because Opus does > have pre-roll, but I suspect it's usually just not discussed for formats > that lack the concept. OK. > 4.4.1: "may be present" ... did you mean optional? This is copying Yusuke's wording (and his diagram for the most part), but I did mean 'may or may not be present, depending on what's appropriate in context'. > Might be good to have an acknowledgments section somewhere crediting Yusuke? Yes. I just didn't want people contacting with questions or errata.
(In reply to cmontgomery from comment #10) > That pretty much summarizes the devil on my other shoulder. I just wanted > to avoid having to parse the list to get the length, nothing more. I'd be > happy to do it your way too. the MP4 box already has a size set, so we can deduct the size of all the metadata blocks but substracting 1 (the byte used for the version) from the box size right? As far as the player is concerned, all it cares about (at least with FF implementation) is the STREAMINFO block, and maybe the vorbis comment. So once it has found that, it can directly jump to the end of the box. Having said all that, why not make it all identical to the Ogg mapping? https://xiph.org/flac/ogg_mapping.html The one-byte packet type 0x7F The four-byte ASCII signature "FLAC", i.e. 0x46, 0x4C, 0x41, 0x43 A one-byte binary major version number for the mapping, e.g. 0x01 for mapping version 1.0 A one-byte binary minor version number for the mapping, e.g. 0x00 for mapping version 1.0 A two-byte, big-endian binary number signifying the number of header (non-audio) packets, not including this one. This number may be zero (0x0000) to signify 'unknown' but be aware that some decoders may not be able to handle such streams. The four-byte ASCII native FLAC signature "fLaC" according to the FLAC format specification The STREAMINFO metadata block for the stream. The difference being is that the single FLACSpecificBox box contains all at once which would closely match the flac raw container once you skip the first 13 bytes.
(In reply to Jean-Yves Avenard [:jya] from comment #11) > (In reply to cmontgomery from comment #10) > > That pretty much summarizes the devil on my other shoulder. I just wanted > > to avoid having to parse the list to get the length, nothing more. I'd be > > happy to do it your way too. > > the MP4 box already has a size set, so we can deduct the size of all the > metadata blocks but substracting 1 (the byte used for the version) from the > box size right? Yes. > As far as the player is concerned, all it cares about (at least with FF > implementation) is the STREAMINFO block, and maybe the vorbis comment. So > once it has found that, it can directly jump to the end of the box. > > Having said all that, why not make it all identical to the Ogg mapping? I saw no reason to do so given that the more common native stream doesn't do it that way, nor does Matroska. I think I'll just drop the number of blocks field since it's purely redundant. Monty
Attachment #8779826 - Attachment is obsolete: true
Apologies for the extreme delay in replying to this. We've reviewed the updated draft and, on the whole, it looks really great. Just a few comments: In 4.3.1, to make it clear the fields with a '+' bullet point are not extensions to the AudioSampleEntryBox, it would be good to see a comment saying something along the lines of "The fields from the AudioSampleEntryBox shall be set as follows:" In 4.3.2, the FlacSpecificBox extends Box and has a version field. Normally if versioning is required we would expect FullBox to be the parent since versioning is then built into the box header, with flags set to 0. Typos: 4.2 Bullet point 5 references a "FLAC packet". Packet does not seem to be defined in the FLAC spec. Should this be "FLAC frame"? 4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER 4.3.2 contianing -> containing 4.3.2 METADATA_BLOCKDATA -> METADATA_BLOCK_DATA 4.3.3 FLACDSampleEntry -> FLACSampleEntry
Depends on: 1303888
platform-rel: --- → ?
Whiteboard: [platform-rel-BBC][platform-rel-BBCNews]
(In reply to David Evans from comment #14) > Apologies for the extreme delay in replying to this. > > We've reviewed the updated draft and, on the whole, it looks really great. > Just a few comments: > > In 4.3.1, to make it clear the fields with a '+' bullet point are not > extensions to the AudioSampleEntryBox, it would be good to see a comment > saying something along the lines of "The fields from the AudioSampleEntryBox > shall be set as follows:" OK, will add. > In 4.3.2, the FlacSpecificBox extends Box and has a version field. Normally > if versioning is required we would expect FullBox to be the parent since > versioning is then built into the box header, with flags set to 0. If that's the usual convention, we should do it that way. I will update the spec. > Typos: > 4.2 Bullet point 5 references a "FLAC packet". Packet does not seem to be > defined in the FLAC spec. Should this be "FLAC frame"? Yes, I've stumbled a bit over some aspect of the two specs using different but overlapping terminology. I'll clarify (and your assumption is correct). > 4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER > 4.3.2 contianing -> containing > 4.3.2 METADATA_BLOCKDATA -> METADATA_BLOCK_DATA > 4.3.3 FLACDSampleEntry -> FLACSampleEntry Will got these too.
> Will got these too. How ironic. Ignore the typo with respect to correcting typos.
platform-rel: ? → +
By the way, I recommend to register the identifier at http://www.mp4ra.org/request.html.
Whiteboard: [platform-rel-BBC][platform-rel-BBCNews] → [platform-rel-BBC][platform-rel-BBCNews][platform-rel-Vimeo]
Attached file Draft v0.0.2 (obsolete) —
I took the liberty of updating the draft in response to comments. - Removed implicit MetadataBlocks count - Made FLACSpecificBox extend FullBox (BREAKING CHANGE) - Added clarifying text to field descriptions. - Fixed some typos, formatting, and wording. In implementing this, I found the implicit MetadataBlock count a little strange. Almost everything in mp4 is length-prefixed, usually by wrapping it in a box so I wasn't able to use a fixed loop bound or our normal iterator. However, one does generally have to clip reads to the declared length of the parent boxes, so in practice it was fine, and there is precedent for 'read until the end without box headers' e.g. in AlternateStartupEntry ('alst') and ProgressiveDownloadInfoBox ('pdin'). The current design is probably worthwhile for keeping the native block structure intact for ease of remuxing.
Attachment #8782097 - Attachment is obsolete: true
Attachment #8797226 - Flags: feedback?(monty)
Some review comments from Paranoialmaniac, author of the Opus-in-mp4 spec: > at a glance, it is unneeded to mention about the edit > list. flac has no paddings and no requirements for > removing it by the edit list, isn't it? i think all terms > and definitions there are unneeded To clarify, flac is a block-based codec, but there are two internal mechanisms which can reproduce a fixed duration. (1) Block sizes are variable. Although there is a minimum size of 16 samples, the last two blocks can be adjusted to set any arbitrary length over that. https://xiph.org/flac/format.html#blocking (2) The STREAMINFO header can optionally indicate the total number of samples (up to 36 bits, or 4 days at 192 kHz). I suspect, but have not verified, that the reference implementation uses fixed block lengths and sets a total sample number to trim the output. I think removing the edit list definition is fine; it and the 'active track' are dangling references. But we should add text to the encapsulation section to clarify that the decoder will trim the input frames to the correct duration. No offset or padding is necessary and and edit list is purely for post-encode trimming. > samplerate field, which is 32bit field, in > AudioSampleEntry is left-shifted by 16bits. so it can > express up to 65535Hz This is a pretty serious issue given the popularity of 96 and 192 kHz flac files. ISO mp4 doesn't traditionally support sample rates over 65kHz. There is an AudioSampleRateV1 extension defined in 14496-12:2015 (or 2012/Amd.2:2014) but it is not widely supported. For compatibility, it is apparently common to instead put the largest expressible multiple of the sample rate in this field and then override with codec-specific data. There is precedent for this. > The samplerate, samplesize and channelcount fields document the default > audio output playback format for this media. The timescale for an audio > track should be chosen to match the sampling rate, or be an integer > multiple of it, to enable sample-accurate timing. > [...] > The audio output format (samplerate, samplesize and channelcount fields) > in the sample entry should be considered definitive only for codecs that > do not record their own output configuration. If the audio codec has > definitive information about the output format, it shall be taken as > definitive; in this case the samplerate, samplesize and channelcount > fields in the sample entry may be ignored, though sensible values should > be chosen (for example, the highest possible sampling rate). > [ISO 14496-12:2012, sec 8.5.2.1, pages 29, 30.] In the AudioSampleEntryV1 ammendment, the sample-accurate timing choice won out. Paranoialmaniac advises: > it is complex. the old isobmff spec says samplerate = mdhd.timescale << 16 > the latest has different defintion. see 14496-12:2015 12.2.3.1 Definition > anyway, for high sampling rates, you should fix the draft. > for almost all codecs, samplerate field is useless. if you > want indicate actual samlerate within the spec of isobmff, > use AudioSampleEntryV1 and store the actual value in > sampling rate box > (i don't recommend use of AudioSampleEntryV1 for compatibilities) > [...] > it's really difficult question :D but 14496-12:2015 says > "the samplerate field in the sample entry should contain a > value left‐shifted 16 bits (as for AudioSampleEntry) that > matches the media timescale, or be an integer division or > multiple of it.". so, i prefer the maximum value of > available divisions of the high sampling rate. e.g. for > 96000Hz, set samplerate to 48000 > and let FLACSpecificBox overrides it
Attached file Draft v0.0.3 (obsolete) —
Update draft in response to Paranoialmaniac's comments. - Removed 'Terms and Definitions' section. I decided in the end it was clear enough without mentioning edit lists. - Describe how to handle high-rate files. I.e. put 48kHz for 96 and 192 kHz bitstreams and read the FLAC Specific Box for the override. - Removed `aligned(8)` declaration on FLACMetadataBlock. While some flac metadata blocks are multiples of 8 bytes in length, many, including STREAMINFO VORBIS_COMMENT and PICTURE are not, so unless we add some machanism for padding, blocks after the first can end with only byte alignment.
Attachment #8797226 - Attachment is obsolete: true
Attachment #8797226 - Flags: feedback?(monty)
Attachment #8797684 - Flags: feedback?(monty)
I've landed copies of the drafts so far in the upstream flac repo for easier reference. https://git.xiph.org/?p=flac.git;a=blob;f=doc/isoflac.txt
(In reply to cmontgomery from comment #15) > (In reply to David Evans from comment #14) > > 4.3.2 MEATADATA_BLOCK_HEADER -> METADATA_BLOCK_HEADER This typo still appears in three places.
Attached file Draft v0.0.4
Fix typos reported in #22.
Attachment #8797684 - Attachment is obsolete: true
Attachment #8797684 - Flags: feedback?(monty)
Rank: 10
Ralph - what do we need to do before we close this bug?
Flags: needinfo?(giles)
I think we can consider this resolved. We have an implementation in Firefox with interoperates with the other implementations I'm aware of. We could pursue more formal standardization, but I don't personally want to tackle that right now. Further development of Monty's spec should happen in the upstream flac repository. https://git.xiph.org/flac.git or https://github.com/xiph/flac
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(giles)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: