Closed Bug 1093815 Opened 5 years ago Closed 5 years ago

Use AndroidPlatformDecoder for standalone MP3 on Android

Categories

(Core :: Audio/Video, defect)

All
Android
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla41
Tracking Status
firefox35 --- affected
firefox36 --- affected
firefox37 --- affected
firefox38 --- affected
firefox41 --- verified
relnote-firefox --- 41+
fennec 41+ ---

People

(Reporter: snorp, Assigned: esawin)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 13 obsolete files)

32.36 KB, patch
esawin
: review+
Details | Diff | Splinter Review
1.75 MB, patch
esawin
: review+
Details | Diff | Splinter Review
1.24 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
Bug 1014614 gives us H264/AAC with MP4 via MediaCodec, but we are still relying on OmxPlugin for MP3 (and maybe other stuff?). Since that's busted on Android Lollipop (bug 1050832), those users currently have no standalone MP3 support.
tracking-fennec: --- → ?
:cpearce says that a platform-independent MP3 reader which relies on PlatformDecoderModule for actual decoding has been under consideration for a while. This might be a reason to finally do that.
Summary: Use MediaExtractor and MediaCodec for standalone MP3 on Android → Use AndroidPlatformDecoder for standalone MP3 on Android
tracking-fennec: ? → 37+
Blocks: android-l
Duplicate of this bug: 1102749
This is a huge loss which is tracking a future release, we should retarget.
Assignee: nobody → snorp
Assignee: snorp → kinetik
Status: NEW → ASSIGNED
I take it this isn't happening on 37.
It will happen on any Lollipop device, on any version of Firefox
Oh I misunderstood. We might be able to get it in for 37.
Duplicate of this bug: 1134427
tracking-fennec: 37+ → ?
tracking-fennec: ? → 39+
WIP patch to add MP3TrackDemuxer.

Finished:
* MP3 frame parser
* ID3 header parser
* MP3 frame reading

TODO:
* MP3 frame seeking
* Tests and validation
* Comments
Attachment #8585816 - Flags: feedback?(kinetik)
Comment on attachment 8585816 [details] [diff] [review]
0001-Bug-1093815-Add-MP3-track-demuxer.patch

Review of attachment 8585816 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great overall, thank you!

In general, rather than working with raw uint8_t pointers, it'd be nicer (and safer) to use ByteReader.  That provides some protection from accidental (or malicious) out-of-bounds reads.  For instance, if the buffer ended immediately after the XING tag, ParseXing could read two uint32_ts from out-of-bounds memory.

I'll refresh my memory with MP3 parsing details and go over some of the more specific bits soon, but I wanted to get the early feedback to you as soon as possible.

::: media/libstagefright/binding/MP3TrackDemuxer.cpp
@@ +13,5 @@
> +
> +// Misc helper functions
> +
> +template<typename T>
> +uint32_t FromBigEndian(const T* aBeg) {

Use BigEndian::readUint32 from Endian.h.

@@ +89,5 @@
> +}
> +
> +bool
> +MP3Demuxer::Init() {
> +  return mSource->Length(&mStreamLength);

It's possible we won't know the length of the file (e.g. streaming from a ShoutCast style server) but we still want to be able to at least demux and play the file, even if we can't support more advanced operations such as seeking.

@@ +144,5 @@
> +}
> +
> +MP4Sample*
> +MP3Demuxer::GetNext() {
> +  uint8_t buffer[BUFFER_SIZE];

Use an nsTArray<uint8_t> here.  It's nice to avoid putting too much on the stack (I know BUFFER_SIZE is only 4kB, but changing it would require reasoning about stack sizes, etc.) and has a minor benefit in terms of exploit difficulty in case of a buffer overrun.

@@ +195,5 @@
> +
> +uint32_t
> +MP3Demuxer::Read(uint8_t* aBuffer, const uint32_t aOffset, const uint32_t aSize) {
> +  uint32_t read = 0;
> +  mSource->ReadAt(aOffset, aBuffer, aSize, &read);

Check the result of ReadAt.

@@ +498,5 @@
> +}
> +
> +uint16_t
> +MP3Demuxer::ID3Parser::Header::Version() const {
> +  return *reinterpret_cast<const uint16_t*>(mRaw + ID_END);

Doesn't this need to be handled with a particular endianness?

@@ +508,5 @@
> +}
> +
> +uint32_t
> +MP3Demuxer::ID3Parser::Header::RawSize() const {
> +  return *reinterpret_cast<const uint32_t*>(mRaw + FLAGS_END);

Doesn't this need to be handled with a particular endianness?

::: media/libstagefright/binding/include/mp4_demuxer/MP3TrackDemuxer.h
@@ +11,5 @@
> +namespace mp4_demuxer {
> +
> +class MP3Demuxer : public mozilla::TrackDemuxer {
> +public:
> +  static const int BUFFER_SIZE = 4096;

Can this be moved to the implementation file?  It doesn't seem to be used in the header.

@@ +12,5 @@
> +
> +class MP3Demuxer : public mozilla::TrackDemuxer {
> +public:
> +  static const int BUFFER_SIZE = 4096;
> +  static const int MAX_FRAMES = 128;

This doesn't seem to be used.

@@ +29,5 @@
> +      static const int FLAGS_END = VERSION_END + FLAGS_LEN;
> +      static const int SIZE_END = FLAGS_END + SIZE_LEN;
> +      static const int SIZE = SIZE_END;
> +
> +      static const uint8_t ID[ID_LEN];

Again, can these be moved to the implementation file?

@@ +94,5 @@
> +    private:
> +      static const int SYNC1 = 0;
> +      static const int SYNC2_VERSION_LAYER_PROTECTION = 1;
> +      static const int BITRATE_SAMPLERATE_PADDING_PRIVATE = 2;
> +      static const int CHANNELMODE_MODEEXT_COPY_ORIG_EMPH = 3;

Same comment re: moving these to the implementation.

@@ +140,5 @@
> +    static const uint16_t SAMPLE_RATE[4][4];
> +    // Samples per frame - use [version][layer]
> +    static const uint16_t FRAME_SAMPLE[4][4];
> +    // Slot size (MPEG unit of measurement) - use [layer]
> +    static const uint8_t SLOT_SIZE[4];

Same comment re: moving these to the implementation.

@@ +161,5 @@
> +  int64_t Duration(const int64_t aNumFrames) const;
> +
> +  virtual void Seek(Microseconds aTime);
> +  virtual mozilla::MediaSample* DemuxSample();
> +  virtual Microseconds GetNextKeyframeTime();

Mark all of the member functions inherited from TrackDemuxer with the override keyword.
Attachment #8585816 - Flags: feedback?(kinetik) → feedback+
Assignee: kinetik → esawin
Updated based on feedback, fixed some issues, refactored and rebased on new MediaRawData interface.

This version passes the ID3 and frame header tests with approximate duration matching.

TODO:
* some more refactoring
* function comments
* extract VBR header info for more accurate seeking
Attachment #8585816 - Attachment is obsolete: true
WIP: approximate seeking.
Tests.

TODO:
* seeking tests
* VBR average/sample bitrate tests
Fixed frame syncing and duration mapping.
Attachment #8592868 - Attachment is obsolete: true
Extended frame parser tests.

There is a discrepancy between the expected and the actual total number of parsed samples, which I'm looking into.

Otherwise, the tests make sure we parse the leading ID3 headers and MPEG frames correctly and the duration estimation must be within an error range of 1%, which we should be able to improve on in future.
Attachment #8592872 - Attachment is obsolete: true
Attachment #8593337 - Flags: review?(kinetik)
Simplified MP3 resource definition.
Attachment #8593337 - Attachment is obsolete: true
Attachment #8593337 - Flags: review?(kinetik)
Attachment #8593357 - Flags: feedback?(kinetik)
> +    'noise.mp3',
> +    'noise_vbr.mp3',

The two test files above are missing from patch #3.
Now exposing VBR header info for testing.
Attachment #8593336 - Attachment is obsolete: true
Added missing MP3 files and VBR info tests.

Currently, all frame/sample count tests are disabled until I've found a reasonable way to determine a reference value which accounts for trailing frames.
Attachment #8593357 - Attachment is obsolete: true
Attachment #8593357 - Flags: feedback?(kinetik)
Attachment #8594033 - Flags: feedback?(kinetik)
Fully commented version. Fixed a duration estimation bug.

The base functionality is there and working (based on the current tests).
This should be a good base for further iterations.

I would like add some logging next to help with integration and troubleshooting.
Attachment #8594031 - Attachment is obsolete: true
Attachment #8595029 - Flags: review?(kinetik)
Attachment #8594033 - Flags: feedback?(kinetik) → review?(kinetik)
Merged seeking with the main patch, improved seeking support and added channel mode field access.
Attachment #8592870 - Attachment is obsolete: true
Attachment #8595029 - Attachment is obsolete: true
Attachment #8595029 - Flags: review?(kinetik)
Updated duration tests.
Attachment #8594033 - Attachment is obsolete: true
Attachment #8594033 - Flags: review?(kinetik)
Attachment #8596803 - Flags: review?(kinetik)
Attachment #8596804 - Flags: review?(kinetik)
Comment on attachment 8596804 [details] [diff] [review]
0002-Bug-1093815-Add-tests-for-MP3-track-demuxer.patch

Review of attachment 8596804 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe add a test for the channel mode as well?
Attachment #8596804 - Flags: review?(kinetik) → review+
(In reply to Eugen Sawin [:esawin] from comment #22)
> Created attachment 8596803 [details] [diff] [review]
> 0001-Bug-1093815-Add-MP3-track-demuxer.patch
> 
> Merged seeking with the main patch, improved seeking support and added
> channel mode field access.

Matthew, can we get this review done? It's holding us up.
Flags: needinfo?(kinetik)
Comment on attachment 8596803 [details] [diff] [review]
0001-Bug-1093815-Add-MP3-track-demuxer.patch

Review of attachment 8596803 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay reviewing this.  Generally looks great, r+ with comments addressed.

I've done some preliminary fuzzing with afl-fuzz and it looks fine (it produced 9 files it claimed caused the code to hang, but they all seem to be false positives), but it probably needs some more time spent to be reasonably sure this is safe.  I'm particularly nervous here because parsing code regularly suffers from security issues and the previous MP3FrameParser-based code has seen a number of security bugs filed against it.

Please remove all of the trailing whitespace.

File a bug for reworking the code to use ByteReader rather than raw pointer access to reduce the risk of security issues since you had a preference not to make those changes in this patch.

File a bug for remove the existing MP3/ID3 parsing code in MP3FrameParser.{cpp,h} and replacing the existing callers with calls to the new code.

File a bug to improve seeking accuracy using the VBR information and reference that in the existing TODO in the code.

Review the changes made to the old/existing ID3 parsing code in bug 949036 and update the patch or file a bug to apply them to the new code if they're needed.  It looks like it's not needed, but please verify.

::: media/libstagefright/binding/MP3TrackDemuxer.cpp
@@ +71,5 @@
> +  if (!sample) {
> +    return nullptr;
> +  }
> +  return sample.forget();
> +}

I'm not sure what the purpose of DemuxSample() and GetNext() being split is, but it seems like this could just be "return GetNext()"?

@@ +90,5 @@
> +    return -1;
> +  }
> +
> +  // Assume we know the exact number of frames from the VBR header.
> +  double numFrames = mParser.VBRInfo().NumFrames();

Since NumFrames() returns an int64_t and Duration(...) takes one, it'd be nice to avoid the conversion to double here.

@@ +96,5 @@
> +    if (mStreamLength < 0) {
> +      // Unknown length, we can't estimate duration.
> +      return -1;
> +    }
> +    numFrames = (mStreamLength - mFirstFrameOffset) / AverageFrameLength();

...and then this case can be converted from double -> int64_t here instead.

@@ +149,5 @@
> +  }
> +
> +  // Full frame parsed, move offset to its end.
> +  mOffset = frame->mOffset + frame->mSize;
> +  mTotalFrameLen += frameLen;

mTotalFrameLen could overflow if the file is repeatedly seeked and demuxed, since it's not reset upon seeking and accumulates over time.  Perhaps unlikely, but we should either check for it or reset it upon seeking.  Same applies to mNumParsedFrames and mFrameIndex, but even less likely as it's require a huge number of seeks!

@@ +185,5 @@
> +MP3Demuxer::AverageFrameLength() const {
> +  if (!mNumParsedFrames) {
> +    return 0.0;
> +  }
> +  return static_cast<double>(mTotalFrameLen) / mNumParsedFrames;

It seems like this will return incorrect results for VBR streams after seeking since mTotalFrameLen and mNumParsedFrames are not reset, causing AverageFrameLength() to suffer from hysteresis.  Even though we can't provide accurate seeking in VBR MP3 (at least, without using the VBR info), it seems preferable that a seek to the same timecode results in the same estimated offset being calculated each time.

An example to make my point clearer: imagine a file with four frames of length 1, 1, 9, and 9.  After demuxing the first two blocks, AFL() would return 1 and after demuxing all four blocks, it'd return 5.  After this, if we then Seek() to the beginning and demux the first two blocks again, AFL() now returns 3.666..., so the computed offset of the next seek to the same location will differ.

@@ +191,5 @@
> +
> +// FrameParser
> +
> +namespace frame_header {
> +static const int SYNC1 = 0;

Please add a comment explaining that these are byte offsets into mRaw, and the largest value must not exceed the value of SIZE.  Possibly add a static assert to that effect, also.

@@ +279,5 @@
> +
> +// FrameParser::Header
> +
> +FrameParser::FrameHeader::FrameHeader()
> +  : mPos(0)

For all of the fields that need to be reset by Reset(), please do it there and call Reset() from the constructor so it's harder to miss updating Reset() in the future.

@@ +552,5 @@
> +  if (!mHeader.IsValid() || !mHeader.SampleRate()) {
> +    return 0;
> +  }
> +
> +  const float bitsPerSample = mHeader.SamplesPerFrame() / 8.0f;

This doesn't seem to need to be a float.

@@ +621,5 @@
> +
> +// ID3Parser::Header
> +
> +ID3Parser::ID3Header::ID3Header()
> +  : mSize(0), mPos(0)

For all of the fields that need to be reset by Reset(), please do it there and call Reset() from the constructor so it's harder to miss updating Reset() in the future.

::: media/libstagefright/binding/include/mp4_demuxer/MP3TrackDemuxer.h
@@ +271,5 @@
> +
> +  // We keep the first parsed frame around for static info access, the
> +  // previously parsed frame for debugging and the currently parsed frame.
> +  Frame mFirstFrame;
> +  Frame mPrevFrame;

Make mPrevFrame, LastFrame(), and PrevFrame() debug-only.  Since they're only used in the TestMP3Demuxer unit test, a define to only build them for that case would be ideal.

@@ +275,5 @@
> +  Frame mPrevFrame;
> +  Frame mFrame;
> +};
> +
> +// The MP3 demuxer used extract MPEG frames and side information out of

Missing word, "to" maybe?

@@ +340,5 @@
> +  // Current byte offset in the source stream.
> +  uint64_t mOffset;
> +
> +  // Byte offset of the begin of the first frame, or -1 if none parsed yet.
> +  uint64_t mFirstFrameOffset;

Setting an unsigned integer to -1 is a bit surprising.  Looks like this can just be initialized to 0 instead.
Attachment #8596803 - Flags: review?(kinetik) → review+
Flags: needinfo?(kinetik)
Thanks for the review!

> 
> File a bug for reworking the code to use ByteReader rather than raw pointer
> access to reduce the risk of security issues since you had a preference not
> to make those changes in this patch.

I will either file a bug or refactor it directly before landing.


> 
> File a bug to improve seeking accuracy using the VBR information and
> reference that in the existing TODO in the code.

Yes, we definitely need follow-up bugs to improve seeking performance/correctness in general. It would be nice to have unit tests for seeking, too.


> 
> Review the changes made to the old/existing ID3 parsing code in bug 949036
> and update the patch or file a bug to apply them to the new code if they're
> needed.  It looks like it's not needed, but please verify.

The old ID3 parsing code wasn't safe. I did make sure to add some checks in the ID3 header detection to prevent false positives. However, the checks in bug 949036 are even more thorough, so we could extend the checks eventually to reflect that.


> 
> I'm not sure what the purpose of DemuxSample() and GetNext() being split is,
> but it seems like this could just be "return GetNext()"?

That's currently more of a logical interface/implementation split, there are no structural reasons. Maybe we would want some form of indexing in future, which would go into DemuxSample.


> 
> mTotalFrameLen could overflow if the file is repeatedly seeked and demuxed,
> since it's not reset upon seeking and accumulates over time.  Perhaps
> unlikely, but we should either check for it or reset it upon seeking.  Same
> applies to mNumParsedFrames and mFrameIndex, but even less likely as it's
> require a huge number of seeks!

I've been worried about overflows but resetting on seeking would lead to inaccurate consecutive seeks or break it completely. This affects only mTotalFrameLen and mNumParsedFrames, which are used to approximate the average frame length.

A practical fix for this would be to calculate the moving average frame length instead - discarding previous frame length results at a preconfigured rate - without the need to keep accumulative values.

I don't see a reasonable way to handle overflows of mFrameIndex which is tied to and strictly less than mOffset - the stream read offset.

> 
> It seems like this will return incorrect results for VBR streams after
> seeking since mTotalFrameLen and mNumParsedFrames are not reset, causing
> AverageFrameLength() to suffer from hysteresis.  Even though we can't
> provide accurate seeking in VBR MP3 (at least, without using the VBR info),
> it seems preferable that a seek to the same timecode results in the same
> estimated offset being calculated each time.

Yes, this is something I have to address and the moving average solution against overflows would most likely not be an improvement on this aspect.

In pratice, this might not be such a huge issue as the duration computation is just an approximation and frame length variance should be considerably low for anything but long durations of silence.
However, I can think of scenarios where our approximations would go off, as in your example, and we should add mechanisms to stabilize it.


> > +  const float bitsPerSample = mHeader.SamplesPerFrame() / 8.0f;
> 
> This doesn't seem to need to be a float.

That's just to avoid an explicit cast in the next line for the calculation of frameLen, where all variables would be integers otherwise and we would get inaccuracies through intermediate result rounding.


> 
> Setting an unsigned integer to -1 is a bit surprising.  Looks like this can
> just be initialized to 0 instead.

Wow, thanks for catching this!
I think I would prefer the first iteration to have more robust albeit slower seeking support and refine it for better performance in the follow-up patch instead.

This is a hybrid between the fast approximate seeking jumps and slow forward-reading seeking, which should give us more accurate results for now.

I would merge it with the main patch on r+.
Attachment #8603392 - Flags: review?(kinetik)
Attachment #8603392 - Flags: review?(kinetik) → review+
Blocks: 1163667
Merged stable seeking into main patch and updated according to review comments.
Attachment #8596803 - Attachment is obsolete: true
Attachment #8603392 - Attachment is obsolete: true
Attachment #8604835 - Flags: review+
Cross-platform compatibility update with minor syntax fixes.
Attachment #8596804 - Attachment is obsolete: true
Attachment #8604836 - Flags: review+
It looks like we need to remove some redundant typedefs to prevent build errors (C2872) on Windows.
Attachment #8604837 - Flags: review?(kinetik)
Attachment #8604837 - Flags: review?(kinetik) → review+
tracking-fennec: 39+ → 41+
Unfortunate we didn't coordinate our efforts here.. All the MP4Reader is going :(
(In reply to Jean-Yves Avenard [:jya] from comment #35)
> Unfortunate we didn't coordinate our efforts here.. All the MP4Reader is
> going :(

Does the MP3TrackDemuxer need to implement the MediaTrackDemuxer interface from bug 1154164 now instead of TrackDemuxer?
Flags: needinfo?(jyavenard)
Yes.

We need a MP3Demuxer : MediaDataDemuxer that can create a MP3TrackDemuxer : MediaTrackDemuxer.

The definition of MediaDataDemuxer and MediaTrackDemuxer is in dom/media/MediaDataDemuxer.h

An implementation of it is in dom/media/fmp4/MP4Demuxer

To test you can apply the patches from bug 1156708 (I should have them pushed to inbound very soon)
Flags: needinfo?(jyavenard)
Blocks: 1166779
Duplicate of this bug: 1169142
Depends on: 1169142
Blocks: 1182100
Tested on several Android versions, including L, and the mp3 playback is working fine 
Verifying as fixed on latest Nightly and Aurora builds
Status: RESOLVED → VERIFIED
Blocks: 1191281
Release Note Request (optional, but appreciated)
[Why is this notable]: MP3 playback finally working on Android L
[Suggested wording]: Support for playing MP3 files on Android 5.0 (Lollipop) and above.
[Links (documentation, blog post, etc)]:
relnote-firefox: --- → ?
You need to log in before you can comment on or make changes to this bug.