Closed Bug 1101534 Opened 10 years ago Closed 9 years ago

Re-enable support for HE-AAC detection, in MP4 files using SBR implicit signalling

Categories

(Core :: Audio/Video, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox36 --- fixed
firefox37 --- fixed

People

(Reporter: jya, Assigned: jya)

References

Details

Attachments

(2 files, 4 obsolete files)

In bug 1096769, we dropped support for SBR implicit signalling, and are now unable to detect SBR data (and as such HE-AAC) in a stream such as:
http://www-tvline-com.vimg.net/streaming/tvline/MAA101-Medianet.mp4


as per comment 7 in bug 1096769, (https://bugzilla.mozilla.org/show_bug.cgi?id=1096769#c7):

Some notes on why HE-AAC in some stream aren't recognised any longer:
https://developer.apple.com/library/ios/technotes/tn2236/_index.html

Basically, now that we are feeding directly raw aac data, and read the AAC magic cookie only explicit signalling is supported.

We also need implicit signalling support, and only via an ADTS stream can Core Audio's APIs detect it.

This ticket aims to track restoring this feature on OSX
Depends on: 1096769
Attached patch Add MP4Sample copy constructor (obsolete) — Splinter Review
Add MP4Sample copy constructor
Attachment #8528280 - Flags: review?(ajones)
Detect HE-AAC streams where SBR signalling is implicit (e.g. must partially decode the stream to see SBR. This is done by using Apple's AudioFileStream API. We remux the raw AAC data in an ADTS stream and feed it to an AudioFileStream until we can extract the AAC magic cookie. We then process this magic cookie to determine the best quality available.
Attachment #8528323 - Flags: review?(giles)
Attachment #8528280 - Flags: review?(ajones) → review+
Comment on attachment 8528323 [details] [diff] [review]
detect SBR (HE-AAC) in AAC streams with implicit signalling

Review of attachment 8528323 [details] [diff] [review]:
-----------------------------------------------------------------

The approach looks fine, but I'd like to see it again with the comments addressed.

::: dom/media/fmp4/apple/AppleATDecoder.cpp
@@ +18,5 @@
>  #else
>  #define LOG(...)
>  #endif
> +#include <TargetConditionals.h>
> +#if TARGET_RT_BIG_ENDIAN

Is this an Apple define? The gecko equivalent is MOZ_BIG_ENDIAN; saves an include.

@@ +21,5 @@
> +#include <TargetConditionals.h>
> +#if TARGET_RT_BIG_ENDIAN
> +#   define FourCC2Str(fourcc) (const char[]){*((char*)&fourcc), *(((char*)&fourcc)+1), *(((char*)&fourcc)+2), *(((char*)&fourcc)+3),0}
> +#else
> +#   define FourCC2Str(fourcc) (const char[]){*(((char*)&fourcc)+3), *(((char*)&fourcc)+2), *(((char*)&fourcc)+1), *(((char*)&fourcc)+0),0}

Wow. I didn't know you could pass an array as a string literal like this. I'm impressed!

Unfortunately I don't like it. This is hard to understand and doesn't do any type checking or normalization of non-printable characters. That's fine when it's logging stuff out of an Apple API, but I'm concerned someone will copy it into a context where it's parsing stream data.

Please make this a class with the above features instead? You said you were bored. :)

We also don't support big-endian macs any more, so you can just #error in that case if you want. Might also be safer than assuming a particular layout for fourcc values on future architectures?

@@ +188,5 @@
> +      return;
> +    }
> +  }
> +
> +  if (!mConverter) {

Checking for !mConverter twice is confusing here. Please document that SetupDecoder returns NS_OK without setting up the decoder if it needs more data.

Or have SetupDecoder() return NS_ERROR_NOT_INITIALIZED in that case and check for that here to be more clear.

@@ +207,5 @@
> +    }
> +    if (NS_FAILED(DecodeSample(aSample))) {
> +      mCallback->Error();
> +      return;
> +    }

You could skip this clause if you just always queue sample at the start of the function.

@@ +279,1 @@
>    }

Better style here. Thanks.

@@ +385,5 @@
> +  PodZero(&inputFormat);
> +
> +  if (mFormatID == kAudioFormatMPEG4AAC &&
> +      mConfig.extended_profile == 2) {
> +    // Check for implicit SBR signalling if stream is AAC-LC

What happens if the extended_profile isn't 2? Are there no such streams, or do they somehow work without ADTS?

::: dom/media/fmp4/apple/AppleATDecoder.h
@@ +46,5 @@
>    AudioConverterRef mConverter;
>    AudioStreamBasicDescription mOutputFormat;
>    UInt32 mFormatID;
> +  AudioFileStreamID mStream;
> +  nsTArray<mp4_demuxer::MP4Sample*> mQueuedSamples;

You'd think there would be an nsTArray variant which takes ownership, but apparently all we have is nsTArray<nsAutoPtr<mp4_demuxer::MP4Sample>>.


I think that's still better than having the ClearQueueSamples() helper. With an smart ptr you should be able to call mQueuedSamples.Clear()() directly.

@@ +51,5 @@
>  
>    void SubmitSample(nsAutoPtr<mp4_demuxer::MP4Sample> aSample);
> +  nsresult DecodeSample(mp4_demuxer::MP4Sample* aSample);
> +  nsresult GetInputAudioDescription(AudioStreamBasicDescription& aDesc,
> +                                    const Vector<uint8_t>& aExtraData);

You can make this static now that you're not accesses mConfig any more. Don't expect it matters much, though.
Attachment #8528323 - Flags: review?(giles) → review-
review- for those comments???

you gotta be kidding me...

(In reply to Ralph Giles (:rillian) from comment #3)

> ::: dom/media/fmp4/apple/AppleATDecoder.cpp
> @@ +18,5 @@
> >  #else
> >  #define LOG(...)
> >  #endif
> > +#include <TargetConditionals.h>
> > +#if TARGET_RT_BIG_ENDIAN
> 
> Is this an Apple define? The gecko equivalent is MOZ_BIG_ENDIAN; saves an
> include.
> 

it is, it's the Apple's official way to test for endianess.

> @@ +21,5 @@
> > +#include <TargetConditionals.h>
> > +#if TARGET_RT_BIG_ENDIAN
> > +#   define FourCC2Str(fourcc) (const char[]){*((char*)&fourcc), *(((char*)&fourcc)+1), *(((char*)&fourcc)+2), *(((char*)&fourcc)+3),0}
> > +#else
> > +#   define FourCC2Str(fourcc) (const char[]){*(((char*)&fourcc)+3), *(((char*)&fourcc)+2), *(((char*)&fourcc)+1), *(((char*)&fourcc)+0),0}
> 
> Wow. I didn't know you could pass an array as a string literal like this.
> I'm impressed!
> 
> Unfortunately I don't like it. This is hard to understand and doesn't do any
> type checking or normalization of non-printable characters. That's fine when
> it's logging stuff out of an Apple API, but I'm concerned someone will copy
> it into a context where it's parsing stream data.

yet it works, and IMHO is far better (and more elegant) than the use of PROPERTY_ID_FORMAT and PROPERTY_ID_PRINT macro defined in AppleUtils.cpp

FWIW, that code is what's extracted from Apple's own iPhone APIs but unfortunately not available on OS X..
this is used to differentiate between running on iPhone simulator (little endian) vs the real thing (big endian)

> 
> Please make this a class with the above features instead? You said you were
> bored. :)

don't see the point of this overhead when it's only used to display specific debugging. Would prefer to remove it alltogether. Though it is useful information IMHO.

> 
> We also don't support big-endian macs any more, so you can just #error in
> that case if you want. Might also be safer than assuming a particular layout
> for fourcc values on future architectures?

see above...
it's how fourcc code are defined in Apple's API

> Or have SetupDecoder() return NS_ERROR_NOT_INITIALIZED in that case and
> check for that here to be more clear.

OK
Agree, it's not super clear there.

> You could skip this clause if you just always queue sample at the start of
> the function.

could do yes.

make the code simpler too.

> > +  if (mFormatID == kAudioFormatMPEG4AAC &&
> > +      mConfig.extended_profile == 2) {
> > +    // Check for implicit SBR signalling if stream is AAC-LC
> 
> What happens if the extended_profile isn't 2? Are there no such streams, or
> do they somehow work without ADTS?

ADTS can only handle profile 1 to 4 (it is stored on two bits). So any profile outside that range would cause Adts::ConvertSample to error.

A HE-AAC with SBR (v1) and/or PS (v2) is always by definition in complement to a AAC-LC core stream
http://en.wikipedia.org/wiki/High-Efficiency_Advanced_Audio_Coding

it is defined in ISO/IEC 14496-3 : "The
High Efficiency AAC Profile contains the audio object types 5 (SBR) and 2 (AAC LC). The High Efficiency AAC Profile is a superset of the AAC Profile. "

Type 5 is explicit and the required information will be found in the existing magic cookie (which in MP4 is the edts extradata) so we don't need to bother about that one.

Only checking for AAC-LC is all we need to worry about, the AudioFileStream demuxer would return the exact same magic cookie instantly

> 
> ::: dom/media/fmp4/apple/AppleATDecoder.h
> @@ +46,5 @@
> >    AudioConverterRef mConverter;
> >    AudioStreamBasicDescription mOutputFormat;
> >    UInt32 mFormatID;
> > +  AudioFileStreamID mStream;
> > +  nsTArray<mp4_demuxer::MP4Sample*> mQueuedSamples;
> 
> You'd think there would be an nsTArray variant which takes ownership, but
> apparently all we have is nsTArray<nsAutoPtr<mp4_demuxer::MP4Sample>>.
> 

I only used the same approach you've used in the H264 decoder !
void
AppleVDADecoder::ClearReorderedFrames()
 {
   while (!mReorderQueue.IsEmpty()) {
     delete mReorderQueue.Pop();
   }
 }

I simply renamed the function to be more in line with the content type.

> 
> I think that's still better than having the ClearQueueSamples() helper. With
> an smart ptr you should be able to call mQueuedSamples.Clear()() directly.
> 
> @@ +51,5 @@
> >  
> >    void SubmitSample(nsAutoPtr<mp4_demuxer::MP4Sample> aSample);
> > +  nsresult DecodeSample(mp4_demuxer::MP4Sample* aSample);
> > +  nsresult GetInputAudioDescription(AudioStreamBasicDescription& aDesc,
> > +                                    const Vector<uint8_t>& aExtraData);
> 
> You can make this static now that you're not accesses mConfig any more.
> Don't expect it matters much, though.

mConfig is still accessed, either that member or another.. can't make them static (don't see the value of doing this quite honestly, even if it was possible)
Carrying r+ comments
Attachment #8528769 - Flags: review?(giles)
Attachment #8528323 - Attachment is obsolete: true
(In reply to Jean-Yves Avenard [:jya] from comment #4)

> you gotta be kidding me...

I love you too, jya. :)

> it is, it's the Apple's official way to test for endianess.

Ok, thanks.

> yet it works, and IMHO is far better (and more elegant) than the use of
> PROPERTY_ID_FORMAT and PROPERTY_ID_PRINT macro defined in AppleUtils.cpp

I know it works. That wasn't my complaint.

> this is used to differentiate between running on iPhone simulator (little
> endian) vs the real thing (big endian)

iPhone is big endian? That's exciting. I thought all the arm targets had switched to little endian for better compatibility with x86 code. That and the __arm__ stanza in TargetConditionals.h on MacOS X.

> don't see the point of this overhead when it's only used to display specific
> debugging.

Debugging is exactly where you don't care about the overhead.

Anyway, what you have in the revised patch bothers me less. Maybe it's successful in looking less magical. Going to fix the instance in AppleUtils, then?

> > Or have SetupDecoder() return NS_ERROR_NOT_INITIALIZED in that case and
> > check for that here to be more clear.
> 
> OK
> Agree, it's not super clear there.

Thanks.

> > What happens if the extended_profile isn't 2? Are there no such streams, or
> > do they somehow work without ADTS?
> 
> Type 5 is explicit and the required information will be found in the
> existing magic cookie (which in MP4 is the edts extradata) so we don't need
> to bother about that one.

I'm still confused here. I see that the AudioFileStream will extract the magic cookie,
but don't we still need to apply ADTS framing regardless? Are you saying extended_profile 2 is the same class as the ADTS-compatible non-extended profile? Perhaps this needs a longer comment.

> I only used the same approach you've used in the H264 decoder !
> void
> AppleVDADecoder::ClearReorderedFrames()
>  {
>    while (!mReorderQueue.IsEmpty()) {
>      delete mReorderQueue.Pop();
>    }
>  }

Just because I had a bad idea, doesn't mean you should copy it!

The ReorderQueue is a nsTPriorityQueue<nsRefPtr<>>, but it does have a Clear method, so maybe I could do the same thing?

> mConfig is still accessed, either that member or another.. can't make them
> static (don't see the value of doing this quite honestly, even if it was
> possible)

Ok. I have a vague idea static methods save memory or dispatch speed, as well as being more easily reusable. I think you're right that it's moot in this case.
(In reply to Ralph Giles (:rillian) from comment #6)
> (In reply to Jean-Yves Avenard [:jya] from comment #4)
> 
> > you gotta be kidding me...
> 
> I love you too, jya. :)

shhhhhhh it's a public forum ! you promised to be quiet


> iPhone is big endian? That's exciting. I thought all the arm targets had
> switched to little endian for better compatibility with x86 code. That and
> the __arm__ stanza in TargetConditionals.h on MacOS X.

it could be, now I'm not sure anymore :)
maybe I'm confusing with something else..

> Anyway, what you have in the revised patch bothers me less. Maybe it's
> successful in looking less magical. Going to fix the instance in AppleUtils,
> then?

I can add it to AppleUtils headers instead sure... in another patch


> > Type 5 is explicit and the required information will be found in the
> > existing magic cookie (which in MP4 is the edts extradata) so we don't need
> > to bother about that one.
> 
> I'm still confused here. I see that the AudioFileStream will extract the
> magic cookie,
> but don't we still need to apply ADTS framing regardless? Are you saying
> extended_profile 2 is the same class as the ADTS-compatible non-extended
> profile? Perhaps this needs a longer comment.
> 

Decoding is still done purely using the raw AAC frames.
that what bug 1096769 did.
To do so, you need the magic cookie. The MP4 container contains one (the edts extradata), but in the case of HE-AAC, it usually only has a magic cookie that only describes the AAC-LC stream.

The trick of this patch is that I use the AudioFileStream to analyse the AAC and extract an updated magic cookie that is more complete (it will contains the description of both the SBR stream and the AAC-LC one).
Once the more complete magic cookie is extracted, I dump the AudioFileStream and do as before: feed raw AAC to the converter.

So I only need the ADTS encapsulated data while we need the AudioFileStream; we don't need ADTS per say but that's what the AudioFileStream takes as input, so I have to do with it.

So yes, we could use your original code to decode the ADTS stream when the AAC profile is <= 4, and use the new one for profile >= 5, but that was adding more code IMHO and would fuzz the logic. You would have in one case the converter fed with data directly if profile was >= 5, or via the AudioFileStream callback if encapsulated with ADTS. And then you would have to hack the timestamp once again.

As I've done now, there's no need for that, you get a 1:1 input/output ratio when feeding the converter. It doesn't buffer anything. Makes it much easier dealing with pts.

> Just because I had a bad idea, doesn't mean you should copy it!
> 
> The ReorderQueue is a nsTPriorityQueue<nsRefPtr<>>, but it does have a Clear
> method, so maybe I could do the same thing?

it's a RefPtr only since mediadata was modified to be ref counted. But yes, we could use Clear() now too.

So where's my r+ ? :)
of forgot to add...

aac_profile and extended_profile are one and the same. It just happens that the MP4 demuxer will set aac_profile to 0 if the value is > 4...

So if the stream is a AAC-LC stream, its aac_profile is 2 and the extended_profile is 2 as well.

For explicit HE-AAC, aac_profile is 0 and extended_profile is 5.
Comment on attachment 8528769 [details] [diff] [review]
detect SBR (HE-AAC) in AAC streams with implicit signalling

Review of attachment 8528769 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for explaining. r=me.

::: dom/media/fmp4/apple/AppleATDecoder.cpp
@@ +378,5 @@
> +    mp4_demuxer::MP4Sample sample(*aSample);
> +    bool rv = mp4_demuxer::Adts::ConvertSample(mConfig.channel_count,
> +                                               mConfig.frequency_index,
> +                                               mConfig.aac_profile,
> +                                               &sample);

Ok, I see now. What's confusing is that it looks like you convert every AAC profile 2 sample to ADTS (and no others). But what you're actually doing is converting a _copy_ of the sample, and passing that into mStream inside GetImplicitAACMagicCookie() until you get a result. Other stream types are instead configured based the old-style GetInputAudioDescription().

The whole thing is protected by the 'if (!mConverter)' at the top of SubmitSample() where the original, non-modified copy of the sample is passed to the decoder as raw aac.

Maybe it would be better to move the copy and Adts conversion inside GetImplicitAACMagicCookie? Those details aren't otherwise relevant to SetupDecoder and it will make the followup with GetInputAudioDescription follow more obviously.
Attachment #8528769 - Flags: review?(giles) → review+
(In reply to Ralph Giles (:rillian) from comment #10)
> Comment on attachment 8528769 [details] [diff] [review]
> detect SBR (HE-AAC) in AAC streams with implicit signalling
> 
> Review of attachment 8528769 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for explaining. r=me.
> 
> ::: dom/media/fmp4/apple/AppleATDecoder.cpp
> @@ +378,5 @@
> > +    mp4_demuxer::MP4Sample sample(*aSample);
> > +    bool rv = mp4_demuxer::Adts::ConvertSample(mConfig.channel_count,
> > +                                               mConfig.frequency_index,
> > +                                               mConfig.aac_profile,
> > +                                               &sample);
> 
> Ok, I see now. What's confusing is that it looks like you convert every AAC
> profile 2 sample to ADTS (and no others). But what you're actually doing is
> converting a _copy_ of the sample, and passing that into mStream inside
> GetImplicitAACMagicCookie() until you get a result. Other stream types are
> instead configured based the old-style GetInputAudioDescription().

It is unfortunate that Adts::ConvertSample works on the original sample and is destructive.
It would have been better if it had returned a new MP4Sample, especially as internally a new memory buffer is allocated and data is copied.

That's why I created a MP4Sample copy constructor.

> 
> The whole thing is protected by the 'if (!mConverter)' at the top of
> SubmitSample() where the original, non-modified copy of the sample is passed
> to the decoder as raw aac.

yes, as soon as we've created a converter, we won't enter SetupDecoder anymore, and there's no more ADTS conversion occurring.
that also minimize the memory usage.
> 
> Maybe it would be better to move the copy and Adts conversion inside
> GetImplicitAACMagicCookie? Those details aren't otherwise relevant to
> SetupDecoder and it will make the followup with GetInputAudioDescription
> follow more obviously.

You're right... it is indeed clearer that way, will update...
(In reply to Jean-Yves Avenard [:jya] from comment #11)

> It is unfortunate that Adts::ConvertSample works on the original sample and
> is destructive.
> It would have been better if it had returned a new MP4Sample, especially as
> internally a new memory buffer is allocated and data is copie

Feel free to make it so...in a separate bug.
Carrying r+ comment. Move ADTS creating to GetImplicitAACMagicCookie ; rename variable there to make it more explicit.
Attachment #8528769 - Attachment is obsolete: true
try run appears busted and didn't resume.

so another one:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=28d3deb38a66
Keywords: checkin-needed
Attachment #8528280 - Attachment is obsolete: true
Fixing non-unified compilation
Attachment #8529411 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/304bca47f33f
https://hg.mozilla.org/mozilla-central/rev/1998957c30f0
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Comment on attachment 8530147 [details] [diff] [review]
Add MP4Sample copy constructor

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Less consistent playback support, video sites more likely to use flash.
[Describe test coverage new/current, TBPL]: Landed on m-c.
[Risks and why]: Low. This patch just adds a copy constructor; no running code is changed.
[String/UUID change made/needed]: None
Attachment #8530147 - Flags: approval-mozilla-aurora?
Comment on attachment 8530220 [details] [diff] [review]
detect SBR (HE-AAC) in AAC streams with implicit signalling

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Some mp4 audio tracks will not play.
[Describe test coverage new/current, TBPL]: Landed on m-c.
[Risks and why]: Low. Change is straightforward to handle this class of files properly.
[String/UUID change made/needed]: None
Attachment #8530220 - Flags: approval-mozilla-aurora?
Attachment #8530147 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8530220 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.