Closed Bug 960873 Opened 6 years ago Closed 6 years ago

[b2g] mp3 buffers extremely memory inefficient with jemalloc

Categories

(Core :: Audio/Video, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

(Keywords: perf, Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][clownshoes])

Attachments

(10 files, 33 obsolete files)

69.57 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
4.59 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
2.40 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
1.94 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
3.29 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
12.69 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
4.43 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
2.51 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
12.72 KB, patch
Details | Diff | Splinter Review
69.80 KB, patch
Details | Diff | Splinter Review
While investigating bug 957346 I noticed some odd memory behavior.  While the about-memory report showed 16MB of decoded-audio for the poppit app, there seemed to be another 15MB of heap-unclassified that went with it.  If I cleared the buffers, this heap-classified went away.

I spent a long time thinking this was due to OMX decoder resources or something, but after a day of investigation that appears not to be the case.

Instead I found we are losing significant amount of memory due to a mismatch between the mp3 frame size (4608 bytes) and the jemalloc arena bins (4096 and 8192 in this case).

After the OMX decoder produces a frame we allocate a buffer of matching size here:

  http://mxr.mozilla.org/mozilla-central/source/content/media/omx/MediaOmxReader.cpp#323

Using code like:

  nsAutoArrayPtr<AudioDataValue> buffer(new AudioDataValue[frame.mSize/2] );

This results in a heap allocation of 4608 bytes.  It appears, however, jemalloc really gives us a buffer of 8192 bytes.  For example, see the comments here:

  http://mxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c#47

Specifically:

  * Allocation requests are rounded up to the nearest size class, and no record
  * of the original request size is maintained.

I tested this by hardcoding an 8000 byte allocation instead of the 4608.  The app memory usage b2g-info and about-memory remained the same.

Unfortunately the 4608 byte mp3 frame size is just a poor fit with the jemalloc bucket sizes.  We end up wasting 3584 bytes per mp3 frame!  This about 43% waste.  Since apps like games tend to pre-load sound effects to be low-latency this wasted memory can sit around for long periods.

This effect is largely mitigated by Web Audio if the decoded data is stored in a single large buffer and then played with AudioContext.createBufferSource().  Still, HTML5 audio seems widespread so it would be useful to mitigate this memory wastage if we can.

Unfortunately we probably don't want to introduce latency by combining multiple frames into a single heap allocation.  The only two solutions I can think of are:

1) Implement a simple, fixed-size block allocator within the media component for mp3 frames.
2) Add an additional bucket size to jemalloc.

I haven't looked to closely at jemalloc, so I'm not sure how easy it will be to do (2).  The table in the comment does mention some odd sizes like 1020kb, so maybe it has some tunable value for how efficient an allocation has to be to draw from the arena.

I think (1) is clearly distasteful, but would be doable.  We should try to tweak maintain a single allocator to avoid fragmenting our memory if we can, though.

Any other ideas?
This heavily effects the poppit app and any game using HTML5 audio.
Blocks: b2g-poppit
I'm basing the information about 4608 bytes being the largest mp3 frame size on this:

  #define BUFSIZE   8192   // big enough to hold 4608 bytes == biggest mp3 frame

Which is found in the android libstagefright code:

  ./frameworks/base/media/libstagefright/codecs/mp3dec/src/pvmp3_dec_defs.h
Whiteboard: [c=memory p= s= u=][memshrink][tarako] → [c=memory p= s= u=][MemShrink][tarako][clownshoes]
:njn recommended layering on an arena allocator, such as provided in:

  nsprpub/lib/ds/plarena.h

I'll investigate this tomorrow.
Yeah, this is just unlucky sizing. Adding a new size class to jemalloc sounds hard.

Probably the best bet is to use an arena allocator. If the chunk size is 32KB, you'll fit 7.11 frames per chunk, which is very little wasted space. If these things tend to die together, that's even better, because it's a natural fit for an arena allocator.  See nsprpub/lib/ds/plarena.h.
Looking at plarena.h it appears the trick will be to manage the life-cycle of the allocations in a way that fits with the media code.  Since there is no de-allocation support in plarena.h we'll need to have a good place to release the entire arena.  We'll probably also need to layer on free-list as well to support streamed media.

I think perhaps the best place to do this would be MediaQueue defined here:

  http://mxr.mozilla.org/mozilla-central/source/content/media/MediaDecoderReader.h#321

Each MediaQueue would have its own arena.

I was thinking I could modify the MediaQueue API such that it performs the allocation of T[] arrays like AudioData[] instead of having the client allocate before Push().  It could then perform the correct deallocation for the arena in the Pop() method.  The main downside here is the data would either need to be copied out to be returned from Pop() or require the caller to use Peak() and then Pop().

The arena could be lazily initialized the first time data is Push()'d without an arena present.  The arena could be free'd eagerly when all the buffers have been drained.

Paul, Chris, what do you think of this plan?  Do have any other ideas on a better place to tackle this?
Flags: needinfo?(padenot)
Flags: needinfo?(cpearce)
To clarify, I was planning to PL_InitArenaPool() in the first MediaQueue::Push() call and then PL_ArenaPoolFinish() when the last buffer is drained from MediaQueue::Pop().  The goal here being to truly free the media buffers back to the heap (vs using PL_ARENA_RELEASE() and keeping them in the shared arena pool free list).
Whiteboard: [c=memory p= s= u=][MemShrink][tarako][clownshoes] → [c=memory p=4 s= u=][MemShrink][tarako][clownshoes]
Whiteboard: [c=memory p=4 s= u=][MemShrink][tarako][clownshoes] → [c=memory p=4 s= u=tarako][MemShrink][tarako][clownshoes]
Priority: -- → P1
Blocks: 128RAM
More thoughts on how to proceed:

1) Create a MediaArenaAllocator class that will contain the plarena.  It will also track when all allocations have been free'd and Finish the plarena to restore memory to the system heap.  A new plarena will then be created if necessary on the next allocation.
2) Add a MediaArenaAllocator mArena to each MediaDecoderReader object.
3) Add a MediaDecoderReader::CreateAudioData() method to replace instances of new AudioData().  This will allow the reader instance to pass its mArena to each AudioData object.
4) Modify the AudioData constructor to take a buffer size parameter instead of a buffer pointer.  A new buffer will then be allocated using the mArena provided in (3) above.  The AudioData will be the sole owner of the buffer memory for its lifetime.  Clients can still access the memory directly to copy the data into the buffer and otherwise manipulate it as done today.
5) The AudioData destructor will use mArena to free the buffer.
6) Modify all instead of new AudioData() to use the new MediaDecoderReader::CreateAudioData() method instead. ***

*** Note: I could only find one place where new AudioData() is used, but not within a MediaDecoderReader object.  This is in the fmp4 decoder.  I'll have to investigate a solution for this.

So instead of code doing this:

  nsAutoArrayPtr<AudioDataValue> buffer = new AudioDataValue[numSamples];
  memcpy(buffer, src, numSamples*2);
  AudioQueue().Push(new AudioData(..., buffer.forget(), ...));

They will instead do:

  nsAutoPtr<AudioData> data = CreateAudioData(..., numSamples, ...);
  memcpy(data->mAudioData, src, numSamples*2);
  AudioQueue().Push(data.forget());

Chris, Paul, let me know if I'm off in the weeds.  :-)
I think we can fix this with less complexity. To avoid fragmentation, we will submit audio in chunks of 4096 - N bytes. So basically loop over the Push(AudioData) and calculate the right lengths and offsets.

N has to be chosen according to whatever jemalloc's reserve is that it enlarges the allocation by.

This setup will waste very little memory and will also lead to memory being freed faster because we can free the page the moment the last byte is played instead of keeping a large arena alive.
(In reply to Andreas Gal :gal from comment #8)
> This setup will waste very little memory and will also lead to memory being
> freed faster because we can free the page the moment the last byte is played
> instead of keeping a large arena alive.

Well, I was trying to arrange it so that the memory could be free'd after the last sample is played by having a separate arena per media source.  Still, I think you're right that re-framing the data would just be simpler.

Re-framing the 4608 byte buffer would end up using two jemalloc blocks at sizes 4096 and 1024.  This is a savings of 3072 bytes per mp3 frame.

Ideally I think we should unify this logic somewhere so all the various codec providers could use it.  I think this could be done with a new method on AudioData.  I'll take a look on Tuesday.

Are there any assumptions elsewhere in the media code that would prevent re-framing?  For example, do we expect audio and video to remain in sync on a buffer-by-buffer basis?  Should we be concerned about rounding errors in re-calculating the duration of the new frames, etc?  Sorry if these are stupid questions, but I'm still trying to wrap my head around all the different code paths and configurations in media.

Thanks!
roc can answer your questions I am sure.
Flags: needinfo?(roc)
Actually, it occurred to me we could make a new C++ type that appears to be a flat 4608 (or whatever length) array, but internally keeps multiple jemalloc heap buffers.  We could call it CompactHeapArray<T> or something.

This would make the fix largely transparent to the media code.  It could also be re-used for other allocation slop issues that might arise.
> Actually, it occurred to me we could make a new C++ type that appears to be
> a flat 4608 (or whatever length) array, but internally keeps multiple
> jemalloc heap buffers.  We could call it CompactHeapArray<T> or something.
> 
> This would make the fix largely transparent to the media code.  It could
> also be re-used for other allocation slop issues that might arise.

I recommend not generalizing this type. It's rare that slop is a big issue for fixed-size allocations. And generalizing it would make it much more complex -- you'll have to build in knowledge of jemalloc's size classes, and then have some kind of algorithm for determining how to break up arbitrary-sized allocations.
+1 to comment 12. A quick fix here will do.
Sounds good. I will start with a minimal solution just for the OMX decoder. We can generalize to the other decoders later if necessary. Thanks for the guidance!
This is how chunking could work. This should be easily usable from other readers as well. That having said I wonder why we are pulling 16mb of data out of the decoder to begin with. Shouldn't we limit how much data we decode by slowing down the decoder once we have a good amount of data buffered? (lets say 512kb). That should make the impact of padding also much less dramatic. roc?
(In reply to Andreas Gal :gal from comment #15)
> That having said I wonder why we are pulling 16mb of data
> out of the decoder to begin with. Shouldn't we limit how much data we decode
> by slowing down the decoder once we have a good amount of data buffered?
> (lets say 512kb). That should make the impact of padding also much less
> dramatic. roc?

Just to clarify, this is not 16MB for a single media file.  The app is using duplicate <audio> elements to get multi-channel audio:

  http://phoboslab.org/log/2011/03/multiple-channels-for-html5-audio

In particular this is done in the impactjs game library with a default of 4 duplicate <audio> elements for each sound effect.  The sound effects for the game are only 4MB spread out over 33 mp3 file, but they are each loaded 4 times by the app.

I already have a tech evangelism bug open to work with the game developer to tune/fix this.  See bug 959603.

Also, I've been in contact with the author of impactjs, Dominic Szablewski.  The git version of the library already has support for Web Audio which avoids the duplicate load problem (and this bug).  He was kind enough to give me a copy and I verified it worked well on fxos.  The Web Audio implementation dropped poppit memory usage from 70MB to 40MB.

Dominic plans to release the Web Audio version of impactjs in the next few weeks.  He's also planning to write about the audio memory improvements as part of the release.  We should probably do our own blog to educate the game community about the benefits of web audio over HTML5 for multi-channel sound effects.
Cool. If you can polish the attached patch and land it we should be good here then. Thanks!
(In reply to Andreas Gal :gal from comment #17)
> Cool. If you can polish the attached patch and land it we should be good
> here then. Thanks!

Will do.  With the holiday weekend I probably won't get to it until Tuesday.  Thanks!
> While
> the about-memory report showed 16MB of decoded-audio for the poppit app,
> there seemed to be another 15MB of heap-unclassified that went with it.

Most memory reporters use a MallocSizeOf function to measure heap blocks, and those functions include slop bytes caused by jemalloc in their measurement. So I was wondering why this was happening. Turns out the audio reporter computes its memory consumption analytically, which is sub-optimal for exactly this reason. I filed bug 961441 to convert it to use a MallocSizeOf function.
Splitting audio samples up into smaller buffers seems like a reasonable solution.

To reduce sloop further, we could make AudioData data blocks be exactly the size required by jemalloc, and just buffer the remaining samples until the next time we push audio into the audio queue and we have enough data to fill a complete block. We'd need to ensure we enqueue the remaining samples in a new block once we reach end of stream.

(In reply to Ben Kelly [:bkelly] from comment #9)
> Are there any assumptions elsewhere in the media code that would prevent
> re-framing?

No. Provided the AudioData blocks are timestamped correctly, the media code should handle AudioData blocks being split into multiple AudioData buffers just fine.
Flags: needinfo?(cpearce)
Comment on attachment 8362152 [details] [diff] [review]
Draft patch, completely uncompiled and untested.

Review of attachment 8362152 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderReader.h
@@ +629,5 @@
> +          size_t chunk_size = malloc_good_size(AudioDataSize(chunk_frames));
> +          // Adjust duration and frames so we allocate the largest number of
> +          // frames into this chunk without going over the chunk_size.
> +          frames = chunk_size / sizeof(AudioDataValue);
> +          duration = duration * frames / aFrames;

We should pass the sample rate into this function, and use FramesToUsecs here instead.

In fact, we should just pass the sample rate into the constructor and calculate the duration in the constructor, rather than relying on the caller to do it.

@@ +644,5 @@
> +                                     aChannels));
> +
> +      // Remove the frames we just pushed into the queue and loop if there is
> +      // more to be done.
> +      aOffset += AudioDataSize(frames);

aOffset is the offset in the uncompressed stream, it's OK if it's approximate, so I'd not bother doing this.
Comment on attachment 8362152 [details] [diff] [review]
Draft patch, completely uncompiled and untested.

Review of attachment 8362152 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderReader.h
@@ +622,5 @@
> +        size_t chunk_frames = frames;
> +        do {
> +          chunk_frames /= 2;
> +        } while (chunk_frames > 0 &&
> +                 malloc_good_size(AudioDataSize(chunk_frames)) > size);

It's worth double-checking, but I don't think malloc_good_size() is available on all platforms. Bug 814498 is open for adding moz_malloc_good_size(), but hasn't been resolved. It might not even be able to be implemented on all platforms, in which case you'll need fallback code (e.g. don't bother trying to avoid slop).
Attachment #8362152 - Flags: review-
Flags: needinfo?(paul)
Depends on: 814498
Whiteboard: [c=memory p=4 s= u=tarako][MemShrink][tarako][clownshoes] → [c=memory p=4 s= u=tarako][MemShrink:P1][tarako][clownshoes]
Attached patch audio-slop.patch v1 (obsolete) — Splinter Review
Here's a first cut at a working patch.  Many thanks to Andreas for the starting point.

Just asking for feedback at this point.  I need to split the patch up a bit and I'd like to see if can modify the other codecs to use PushAudioData() as well.

Note: I introduced a copy functor in order to support decoders like webm that appear to use something other than a native, flat buffer.
Attachment #8362152 - Attachment is obsolete: true
Attachment #8364660 - Flags: feedback?(n.nethercote)
Attachment #8364660 - Flags: feedback?(cpearce)
Comment on attachment 8364660 [details] [diff] [review]
audio-slop.patch v1

Review of attachment 8364660 [details] [diff] [review]:
-----------------------------------------------------------------

Seems ok, though I didn't entirely understand it. I suspect cpearce will be able to give a more thorough review.

::: content/media/MediaDecoderReader.h
@@ +631,5 @@
> +public:
> +  /**
> +   * Push audio data into the queue.  The queue is responsible for allocating
> +   * AudioDataValue[] buffers.  The caller must provide a functor to copy
> +   * the data into the buffers.  The function must provide the following

s/function/functor/ ?

@@ +632,5 @@
> +  /**
> +   * Push audio data into the queue.  The queue is responsible for allocating
> +   * AudioDataValue[] buffers.  The caller must provide a functor to copy
> +   * the data into the buffers.  The function must provide the following
> +   * signature:

Nit: why mix C and C++ comments? I'd make this one a C++ comment, and save the wasted line at the top and bottom.

@@ +665,5 @@
> +      uint32_t samples = chunkSize / sizeof(AudioDataValue);
> +      nsAutoArrayPtr<AudioDataValue> buffer(new AudioDataValue[samples]);
> +
> +      // Copy audio data to buffer using caller-provided functor.
> +      uint32_t framesCopied = aCopyFunc(buffer, samples);

Is it possible to copy part of a frame? Would that leave things in a problematic state? It seems like it would. If it's not possible, can that be documented?
Attachment #8364660 - Flags: feedback?(n.nethercote) → feedback+
(In reply to Nicholas Nethercote [:njn] from comment #24)
> Seems ok, though I didn't entirely understand it. I suspect cpearce will be
> able to give a more thorough review.

Thanks!  If you are ok with my use of malloc_good_size() and MOZ_MEMORY, then I will direct all future review requests to Chris.

> @@ +632,5 @@
> > +  /**
> > +   * Push audio data into the queue.  The queue is responsible for allocating
> > +   * AudioDataValue[] buffers.  The caller must provide a functor to copy
> > +   * the data into the buffers.  The function must provide the following
> > +   * signature:
> 
> Nit: why mix C and C++ comments? I'd make this one a C++ comment, and save
> the wasted line at the top and bottom.

Ah, I just added this with doxygen style comments out of habit.  I'll convert it to match mozilla style.

> @@ +665,5 @@
> > +      uint32_t samples = chunkSize / sizeof(AudioDataValue);
> > +      nsAutoArrayPtr<AudioDataValue> buffer(new AudioDataValue[samples]);
> > +
> > +      // Copy audio data to buffer using caller-provided functor.
> > +      uint32_t framesCopied = aCopyFunc(buffer, samples);
> 
> Is it possible to copy part of a frame? Would that leave things in a
> problematic state? It seems like it would. If it's not possible, can that be
> documented?

The onus to copy full frames is on the client code providing the functor.  This is due to the fact that different codecs may structure their source frame data differently, so only they know how to copy from it.  I'll add a comment indicating that partial frames will be ignored and must not be copied.
Flags: needinfo?(roc)
> If you are ok with my use of malloc_good_size() and MOZ_MEMORY,
> then I will direct all future review requests to Chris.

Sounds fine.

> I'll add a
> comment indicating that partial frames will be ignored and must not be
> copied.

Lovely!
Part 1 patch that implements just the helper method.  Some changes from the previous patch:

1) I renamed it to PushCompactAudioData() to describe why someone would want to use it.
2) I broke out the chunk size code into a separate routine to reduce template bloat.
Attachment #8364660 - Attachment is obsolete: true
Attachment #8364660 - Flags: feedback?(cpearce)
Attachment #8365354 - Flags: review?(cpearce)
Two follow-up patches show how the routine is used.  If these look reasonable I can update the other codecs.  In particular, we use opus in b2g which I think is in the webm codec.
Attachment #8365357 - Flags: review?(cpearce)
Attachment #8365354 - Attachment is obsolete: true
Attachment #8365354 - Flags: review?(cpearce)
Attachment #8365358 - Flags: review?(cpearce)
Comment on attachment 8365357 [details] [diff] [review]
Part 3: Make ogg/vorbis use MediaDecoderReader::PushCompactAudioData()

I'm going to obsolete the ogg/vorbis patch.  I don't think there is evidence of enough of a benefit here to offset the code complexity increase.  My testing of the ogg/vorbis sounds used in b2g keyboard suggest that this code already does a good job at compacting the buffers.  Its possible other vorbis files might be worse, but so far I have not seen it.
Attachment #8365357 - Attachment is obsolete: true
Attachment #8365357 - Flags: review?(cpearce)
The AppleMP3Reader class was also very inefficient.  It was allocating 98304 bytes for every 46080 bytes of decoded audio data.

This apple API is a bit different, however, in that we can specify how many frames to copy into a buffer we supply.  The size of this buffer is controlled by the AUDIO_MAX_FRAMES constant.  If we pick this to be something that divides evenly into 4608 and also match an allocation buffer size, then we end up with minimal slop.

This patch reduces AUDIO_MAX_FRAMES to 512 to match these conditions.  With this change we end up only allocating 49152 bytes for the same 46080 bytes of decoded audio.  This effectively cuts our memory footprint in half.
Attachment #8366058 - Flags: review?(cpearce)
After realizing that 512 frame buffers work very well for MP3 data in comment 32, it occurred to me its probably much simpler to just do the same thing for OMX.

While the MediaDecoderReader::PushCompactAudioData() approach could be used by multiple decoders, its probably not worth the added code complexity.  Each decoder needs to be examined individually and will likely need a slightly different solution.

Also, we have no evidence that the other decoders have this problem.  The issue is severe here due to the 4608 constant defined in the mp3 format.  It seems prudent simply to focus on solving this known problem for mp3 and leave the other decoders for now.

Sorry for the churn on this bug, but I'm going to obsolete the previous patches in favor of this simple patch.

In both cases about-memory shows a 12MB improvement in decoded-audio for the poppit game.
Attachment #8365356 - Attachment is obsolete: true
Attachment #8365358 - Attachment is obsolete: true
Attachment #8365356 - Flags: review?(cpearce)
Attachment #8365358 - Flags: review?(cpearce)
Attachment #8366093 - Flags: review?(cpearce)
Comment on attachment 8366093 [details] [diff] [review]
Part 1: Reframe OMX audio data to reduce alloc slop.

Review of attachment 8366093 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +334,2 @@
>  
> +  // OMX will often hand us an awkward number of frames, such as 4608.  This

4608 is the number of bytes, not the number of frames, right?
(In reply to Nicholas Nethercote [:njn] from comment #35)
> 4608 is the number of bytes, not the number of frames, right?

Hmm, yes.  According to wikipedia there are 1152 samples per block in layer II and III mpeg audio.  At 16-bit PCM with two channels this results in 4608 bytes.

I'll revisit these constants to make sure I didn't get confused.
From talking with Chris on IRC, its evident we need to address the issue in more codecs.  In addition to omx and apple, we also have directshow, gstreamer, and wfm that decode mp3 data.

This suggests that we should return to the centralized PushCompressedAudioData() method with a copy functor.  To try to simplify things we will share common copy functors where possible.  At first glance it appears that omx, gstream, and wfm could all use the same functor.

I'll post new patches tomorrow.  Sorry again for the churn.
Attachment #8366093 - Attachment is obsolete: true
Attachment #8366093 - Flags: review?(cpearce)
Attachment #8366058 - Attachment is obsolete: true
Attachment #8366058 - Flags: review?(cpearce)
Based on an IRC conversation with Chris, we'd like to put the audio compacting code and copy functors into a separate source file instead of lumping everything into MediaDecoderReader.h.

This patch is the first step towards doing that.  It moves some of the types into separate headers so we can reference them without creating circular dependencies.

Basically:

  MediaData, AudioData, VideoData move to MediaData.h / MediaData.cpp
  MediaInfo, AudioInfo, VideoInfo move to MediaInfo.h / MediaInfo.cpp
  MediaQueue moves to MediaQueue.h
Once more with the new files actually included in the patch.
Attachment #8367677 - Attachment is obsolete: true
This patch adds an AudioCompactor class that wraps a MediaQueue<AudioData> instance.  Pushing to the AudioCompactor will chop up the given buffer into malloc friendly sizes and push the resulting buffers into the queue.

The client must provide a copy functor, but a basic one is provided in AudioCompactor.  This common copy functor can be used when the source data is already in the native byte format; i.e. when the codec can get away with just using memcpy() now.
This patch modifies the omx codec to use the new AudioCompactor with the provided native copy functor.

I'll work on pushing this out to the other mp3 codecs tomorrow.
Attached patch bug960873_part4_gstreamer.patch (obsolete) — Splinter Review
Enable AudioCompactor in gstreamer.  Tested locally on ubuntu 12.04.
Whiteboard: [c=memory p=4 s= u=tarako][MemShrink:P1][tarako][clownshoes] → [c=memory p=5 s= u=tarako][MemShrink:P1][tarako][clownshoes]
Rebased for ImageFormat changes.
Attachment #8367679 - Attachment is obsolete: true
Re-based for header re-arrangement in Part 1.
Attachment #8367782 - Attachment is obsolete: true
I had to update the AudioCompactor patch to deal with the annoying problem of winbase.h #defining GetCurrentTime to GetTickCount.  The new AudioCompactor.cpp apparently causes the headers to get processed in a slightly different order permitting the bogus macro to leak through to MediaDecoderStateMachine.cpp.

To avoid this I added the #undef to MediaDecoderStateMachine.cpp.  I also added it to MediaDecoderStateMachine.h, since it seems like it should probably be there anyway given how its done in MediaDecoder.h.

Unfortunately, with unified builds, though, putting it in the headers is not enough.
Attachment #8368840 - Attachment is obsolete: true
Fix the gstreamer patch commit message.
Attachment #8368218 - Attachment is obsolete: true
Modify DirectShow codec to use AudioCompactor.  This uses a custom copy functor as the buffer is not in a flat, native format.

Tested on windows 8.1 with an mp3, although I'm not sure if I hit the single byte or two byte sample case.  Apparently I don't know how to get printf_stderr() to show on windows.  I added an exit() call to the copy functor, though, to verify I was actually hitting the new code.
Try run for patches to this point:

  https://tbpl.mozilla.org/?tree=Try&rev=9cacaa97296f

I want to write a gtest for the AudioCompactor, do a little more clean up, and then I will be ready for review.
Attachment #8368833 - Flags: review?(cpearce)
Add debug assertions and general code cleanup.
Attachment #8369014 - Attachment is obsolete: true
Attachment #8369251 - Flags: review?(cpearce)
gtest unit tests for the AudioCompactor.
Attachment #8369252 - Flags: review?(cpearce)
Attachment #8367783 - Attachment is obsolete: true
Attachment #8369253 - Flags: review?(cpearce)
Attachment #8369015 - Attachment is obsolete: true
Attachment #8369254 - Flags: review?(cpearce)
Attachment #8369016 - Attachment is obsolete: true
Attachment #8369255 - Flags: review?(cpearce)
This patch does not use AudioCompactor, but instead just tweaks the constant in AppleMP3Reader.cpp.  This is ultimately simpler for this class and most memory efficient.  In order to use AudioCompactor we would need to add an additional copy to the process.
Attachment #8369256 - Flags: review?(cpearce)
Fix commit log message.
Attachment #8369255 - Attachment is obsolete: true
Attachment #8369255 - Flags: review?(cpearce)
Attachment #8369257 - Flags: review?(cpearce)
Comment on attachment 8368833 [details] [diff] [review]
Part 1: Refactor MediaDecoderReader.h into separate headers. (v2)

Review of attachment 8368833 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaInfo.cpp
@@ +15,5 @@
> +using layers::PlanarYCbCrImage;
> +using layers::PlanarYCbCrData;
> +
> +bool
> +VideoInfo::ValidateVideoRegion(const nsIntSize& aFrame,

It seems like we could just put this in VideoUtils.cpp instead, and rename it IsValidVideoRegion().

Then we won't need another file just for a single function.
Attachment #8368833 - Flags: review?(cpearce) → review+
Comment on attachment 8368833 [details] [diff] [review]
Part 1: Refactor MediaDecoderReader.h into separate headers. (v2)

Review of attachment 8368833 [details] [diff] [review]:
-----------------------------------------------------------------

Also, according to the style guide new files should have a different mode line than what you're using (and what's used in the vast majority of the code in content/media/ FWIW).

The mode line should be:
/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set ts=8 sts=2 et sw=2 tw=80: */
/* This Source Code Form is subject to the terms of the Mozilla Public
 * License, v. 2.0. If a copy of the MPL was not distributed with this
 * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Mode_Line

But you're using "tab-width: 2". Please fix that in this patch and other patches before landing.

I only discovered when I requested review last week on some code! ;)
Comment on attachment 8369251 [details] [diff] [review]
Part 2: Create AudioCompactor class to minimize allocation slop. (v4)

Review of attachment 8369251 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: content/media/AudioCompactor.cpp
@@ +14,5 @@
> +static size_t
> +MallocGoodSize(size_t aSize)
> +{
> +# if defined(MOZ_MEMORY)
> +    return malloc_good_size(aSize);

Two space indentation level please.

::: content/media/AudioCompactor.h
@@ +13,5 @@
> +namespace mozilla {
> +
> +class AudioCompactor {
> +public:
> +  AudioCompactor(MediaQueue<AudioData> &aQueue) :

The '&' and '*' in pointers and references should be left hugging, not right hugging, in Gecko code.

https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Operators

So make this:
 AudioCompactor(MediaQueue<AudioData>& aQueue)

There are other instances which need to change too.

@@ +70,5 @@
> +
> +    return true;
> +  }
> +
> +  // Copy functor suitable for source buffers already in native byte format.

// Copy functor suitable for copying audio samples already in the AudioDataValue format/layout expected by AudioStream on this platform. 

(We don't want to confuse audio float/PCM/planar issues with endianess issues here)
Attachment #8369251 - Flags: review?(cpearce) → review+
Attachment #8369252 - Flags: review?(cpearce) → review+
Attachment #8369253 - Flags: review?(cpearce) → review+
Attachment #8369254 - Flags: review?(cpearce) → review+
Need also to do content/media/plugins (android/fennec MP3).
Comment on attachment 8369256 [details] [diff] [review]
Part 7: Reduce apple mp3 MAX_AUDIO_FRAMES to minimize alloc slop.

Review of attachment 8369256 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/apple/AppleMP3Reader.cpp
@@ +14,5 @@
>  // Maximum number of audio frames we will accept from the audio decoder in one
> +// go.  Carefully select this to work well with both the mp3 1152 max frames
> +// per block and power-of-2 allocation sizes.  Since we must pre-allocate the
> +// buffer, choosing a value that divides exactly into 1152 is most memory
> +// efficient.

Add a note saying that in doing so we avoid the need to use the AudioDataCompactor.
Attachment #8369256 - Flags: review?(cpearce) → review+
Comment on attachment 8369257 [details] [diff] [review]
Part 6: Make DirectShow use AudioCompactor. (v3)

Review of attachment 8369257 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/directshow/DirectShowReader.cpp
@@ +251,5 @@
> +{
> +public:
> +  DirectShowCopy(uint8_t *aSource, uint32_t aBytesPerSample,
> +                 uint32_t aSamples, uint32_t aChannels) :
> +    mSource(aSource),

Going forward, constructor initalizers are supposed to be indented like so:

  MyClass()
    : mVar(0)
    , mVar2(0)

https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Whitespace

There are probably other instances of this, please update them before landing.

@@ +265,5 @@
> +    size_t maxSamples = std::min(aSamples, mSamples - mNextSample);
> +    uint32_t frames = maxSamples / mChannels;
> +    size_t byteOffset = mNextSample * mBytesPerSample;
> +    for (uint32_t i = 0; i < maxSamples; ++i) {
> +      if (mBytesPerSample == 1) {

Do the if (mBytes...) outside the loop, so we're not re-doing the comparison unnecessarily.
Attachment #8369257 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #63)
> Comment on attachment 8369257 [details] [diff] [review]
> https://developer.mozilla.org/en-US/docs/Developer_Guide/
> Coding_Style#Whitespace
> 
> There are probably other instances of this, please update them before
> landing.

Note: I only expect you to change the instances in your patches, I'm not asking you to change all of content/media/ to conform to the new guidelines!
Thanks for the review!  I'll get these items fixed up in the morning.

Also, it appears the new gtest is failing on tbpl.  I'll investigate and fix that as well.

For the android/fennec codec I'll need help testing.  I don't have an android device at the moment.
The gtest failed on tbpl because some of the build targets have trace-malloc enabled instead of jemalloc.  This means there is no |malloc_good_size()| provided and we fallback on not chunking at all.  This works fine, but breaks the assumptions in the gtest assertions.

I'll relax the assertions to check acceptable slop amounts instead of exact numbers of bytes.

I also realized that I forgot to put a test in for NativeCopy.  I'll add that as well.  I'll probably ask for re-review due to the number of changes.
Updated per review comments.  Carrying forward r=cpearce.

I intend to land Part 1 patch immediately once the try build turns green since this kind of patch can bit rot quickly.

  https://tbpl.mozilla.org/?tree=Try&rev=4792ac1a460f
Attachment #8368833 - Attachment is obsolete: true
Attachment #8369441 - Flags: review+
Missed a ValidateVideoRegion() call nested an extra directory level deep.  Updated to fix that and new try:

  https://tbpl.mozilla.org/?tree=Try&rev=637795cd0ebf
Attachment #8369441 - Attachment is obsolete: true
Attachment #8369475 - Flags: review+
Another build fix for moving the function to VideoUtils.  Waiting for tbpl to re-open for the next try run.
Attachment #8369475 - Attachment is obsolete: true
Try looked relatively good, so landed part 1 in mozilla-inbound:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/c66ebc941479

Please leave open until I can land the remaining patches.
Whiteboard: [c=memory p=5 s= u=tarako][MemShrink:P1][tarako][clownshoes] → [c=memory p=5 s= u=tarako][MemShrink:P1][tarako][clownshoes][leave-open]
Blocks: 965369
This is needed for bug 965369 which is a blocker.  Nom for 1.3
blocking-b2g: --- → 1.3?
triage: 1.3+ for blocking bug 965369
blocking-b2g: 1.3? → 1.3+
Comment on attachment 8369586 [details] [diff] [review]
Part 1: Refactor MediaDecoderReader.h into separate headers. (v5)

This was previously r=cpearce.  Forgot to re-add the flag when I updated the patch.
Attachment #8369586 - Flags: review+
Updated for review issues.  Carry forward r=cpearce.
Attachment #8369251 - Attachment is obsolete: true
Attachment #8370307 - Flags: review+
Update tests to check for slop threshold instead of exact byte allocations.  This is more flexible and should let us pass tests even when trace_malloc is enabled and preventing us from using malloc_good_size().

I also added a test for the NativeCopy functor.  This was good since it caught an off-by-one error in one of the assertions.

Asking for re-review since its a non-trivially amount of changes to the file.
Attachment #8369252 - Attachment is obsolete: true
Attachment #8370310 - Flags: review?(cpearce)
Update the modeline as previously requested.
Attachment #8370310 - Attachment is obsolete: true
Attachment #8370310 - Flags: review?(cpearce)
Attachment #8370322 - Flags: review?(cpearce)
Attachment #8370322 - Flags: review?(cpearce) → review+
Whiteboard: [c=memory p=5 s= u=tarako][MemShrink:P1][tarako][clownshoes][leave-open] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open]
Update commit log for r=cpearce.
Attachment #8370515 - Flags: review+
Update commit log for r=cpearce.
Attachment #8369253 - Attachment is obsolete: true
Attachment #8369254 - Attachment is obsolete: true
Fix style issues.  Move conditional outside of tight copy loop.  Update commit log for r=cpearce.
Attachment #8369257 - Attachment is obsolete: true
Attachment #8370518 - Flags: review+
Attachment #8370517 - Flags: review+
Update comment to indicate AudioCompactor should not be used with apple MP3 API.  Update commit log for r=cpearce.
Attachment #8369256 - Attachment is obsolete: true
Attachment #8370519 - Flags: review+
One last try build for Part 2 through Part 7:

  https://tbpl.mozilla.org/?tree=Try&rev=b20e55551c72

Once this is green I will land these patches.  I will then work on the android patch.  Again, I will need help testing that patch as I do not have a device.
Can we get ETA to fix this bug? Thank you.
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7]
Now that the parts relevant to b2g have landed in mozilla-central I'm marking 1.4 fixed.

Ryan, would it be possible to uplift Parts 1 to 4 to b2g28_v1.3 and b2g28_v1.3t now?  The bug is still open because I need to write an android patch, but that shouldn't effect b2g.

Sorry for the ni, but I didn't think your normal scripts would pick this up with the bug still open.
Flags: needinfo?(ryanvm)
It appears this might not stick.  I used size_t for the functor aSamples argument which should probably be a uint32_t instead.  This causes a std::min type mismatch which breaks the compiler on win64.

I think the most correct thing to do would be to backout Part 2 to Part 7 and reland with the correct type.

Clearing ni? until I can get this sorted.
Flags: needinfo?(ryanvm)
Convert aSamples argument to copy function to be uint32_t instead of size_t.
Attachment #8370307 - Attachment is obsolete: true
Attachment #8370807 - Flags: review+
Convert copy functor aSamples parameter to be uint32_t instead of size_t.  This should fix the win64 compile failure about mixing types in std::min().
Attachment #8370518 - Attachment is obsolete: true
Attachment #8370809 - Flags: review+
Patch to revert part 2 through part 7 from mozilla-central.

Try build combining this revert, plus fixed patches:

  https://tbpl.mozilla.org/?tree=Try&rev=0e5d9a07c5bc
It was making me nervous having it in, so I backed out 2-7 in https://hg.mozilla.org/mozilla-central/rev/d2d0899bb3bd
The try run passed the win64 build successfully.  I think I need to wait for the backout in comment 92 to get merged to b2g-inbound or mozilla-inbound before I land the patches again.
Attachment #8370811 - Attachment is obsolete: true
Use AudioCompactor in android PluginMediaReader.  This should look familiar as its essentially the exact same code as the omx codec.

Try run:

  https://tbpl.mozilla.org/?tree=Try&rev=afe057c9c807

I don't have a local build environment or device to test on.
Attachment #8370969 - Flags: review?(cpearce)
Fix a minor copy/paste naming mistake in the android MediaPluginReader use of AudioCompactor.  Lets not reference Omx in the copy type name as this is not the OMX codec.
Attachment #8370969 - Attachment is obsolete: true
Attachment #8370969 - Flags: review?(cpearce)
Attachment #8371449 - Flags: review?(cpearce)
The refactor in part 1 does not apply to b2g28_v1_3 cleanly.  I need to roll a branch specific patch.
Here is part 1 rebased to mozilla-b2g28_v1_3.
Ryan, could you merge parts 1, 2, and 4 on mozilla-b2g28_v1_3?  The bug is still open until the android specific code patch gets reviewed.

Parts 1 and 2 had to be rebased for the 1.3 branch.  See:

  attachment 8371627 [details] [diff] [review]
  attachment 8371630 [details] [diff] [review]

Part 4 applies cleanly:

  https://hg.mozilla.org/mozilla-central/rev/610a841538e8
Ryan, can you do a targeted uplift to v1.3 while the bug is still open for an android change?  See comment 101.  Thank you!
Flags: needinfo?(ryanvm)
Sure, you can just set the whiteboard like I did above for future bugs :)
Flags: needinfo?(ryanvm)
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7][checkin-needed-b2g28]
Attachment #8371449 - Flags: review?(cpearce) → review+
Parts 1, 2, and 4 landed on v1.3.

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/8b9f478e1f50
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/91418373be74
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/7d50226ee796
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7][checkin-needed-b2g28] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7]
Botched the landing in mozilla-inbound:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/de00c8bfc9b9

Then reverted:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/22d850638009

And finally landed again:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/ef1d09ac296a
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][leave-open][ETA:2/7] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][ETA:2/7]
And just to be explicit, once Part 8 is merged this bug can be marked fixed.  Thanks!
Thanks Ryan.

I think this is due to an additional ImageContainer.h include sneaking into MediaData.h during the rebase.  I didn't realize try server would work on this branch so did not run it through.  I'll get it checked now.
https://hg.mozilla.org/mozilla-central/rev/ef1d09ac296a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Ok, the try run in comment 109 is a bit ugly, but the reds look like failures in the repo sync to me.  There is a green JB emulator debug build, so I believe the build fix is good.

Lets try to land this on mozilla-b2g28_v1_3 again.  Patches are now:

  attachment 8372019 [details] [diff] [review]
  attachment 8371630 [details] [diff] [review]
  https://hg.mozilla.org/mozilla-central/rev/610a841538e8
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][ETA:2/7] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][ETA:2/7][checkin-needed-b2g28]
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/71b516693df4
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/85a39f463b2e
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/e27ed4af5197
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes][ETA:2/7][checkin-needed-b2g28] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes]
Depends on: 969826
Depends on: 970185
status-b2g-v1.3T?, remove [tarako] whiteboard
Whiteboard: [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][tarako][clownshoes] → [c=memory p=5 s= u=1.3+tarako] [MemShrink:P1][clownshoes]
You need to log in before you can comment on or make changes to this bug.