Closed Bug 1057757 Opened 5 years ago Closed Last year

Large OOM in mozilla::layers::BufferRecycleBin::GetBuffer (during video decoding)

Categories

(Core :: Audio/Video: Playback, defect, critical)

32 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox48 --- affected

People

(Reporter: alex_mayorga, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-da4d4c8e-d655-4ae4-86f2-ce9302140823.
=============================================================
Relevant stack frames, apparently this was while decoding a WebM video:

3 	xul.dll 	mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int) 	gfx/layers/ImageContainer.cpp
4 	xul.dll 	mozilla::layers::PlanarYCbCrImage::AllocateBuffer(unsigned int) 	gfx/layers/ImageContainer.cpp
5 	xul.dll 	mozilla::layers::PlanarYCbCrImage::CopyData(mozilla::layers::PlanarYCbCrData const&) 	gfx/layers/ImageContainer.cpp
6 	xul.dll 	mozilla::layers::PlanarYCbCrImage::SetData(mozilla::layers::PlanarYCbCrData const&) 	gfx/layers/ImageContainer.cpp
7 	xul.dll 	mozilla::VideoData::SetVideoDataToImage(mozilla::layers::PlanarYCbCrImage*, mozilla::VideoInfo&, mozilla::VideoData::YCbCrBuffer const&, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, bool) 	content/media/MediaData.cpp
8 	xul.dll 	mozilla::VideoData::Create(mozilla::VideoInfo&, mozilla::layers::ImageContainer*, mozilla::layers::Image*, __int64, __int64, __int64, mozilla::VideoData::YCbCrBuffer const&, bool, __int64, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&) 	content/media/MediaData.cpp
9 	xul.dll 	mozilla::VideoData::Create(mozilla::VideoInfo&, mozilla::layers::ImageContainer*, __int64, __int64, __int64, mozilla::VideoData::YCbCrBuffer const&, bool, __int64, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&) 	content/media/MediaData.cpp
10 	xul.dll 	mozilla::WebMReader::DecodeVideoFrame(bool&, __int64) 	content/media/webm/WebMReader.cpp
Component: General → Graphics: Layers
Product: Firefox → Core
Summary: crash in OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int) → Large OOM in mozilla::layers::BufferRecycleBin::GetBuffer (during WebM video decoding)
I found that this is something that early adopters to our releases run into more often, it makes the top 10 of crashes in 32.0.3 for example, see this query for more reports: https://crash-stats.mozilla.com/report/list?signature=OOM+%7C+large+%7C+mozalloc_abort%28char+const%2A+const%29+%7C+mozalloc_handle_oom%28unsigned+int%29+%7C+moz_xmalloc+%7C+mozilla%3A%3Alayers%3A%3ABufferRecycleBin%3A%3AGetBuffer%28unsigned+int%29#reports

Looks like there's different paths there, a lot of them via mozilla::OggReader::DecodeTheora but I also some via mozilla::WMFReader::DecodeVideoFrame - all are video decoding, though and the same after mozilla::VideoData::Create.

Because of that, moving to Video/Audio component and asking cpearce for a look into this as he did a number of patches and reviews in MediaData.cpp recently.
Component: Graphics: Layers → Video/Audio
Flags: needinfo?(cpearce)
Summary: Large OOM in mozilla::layers::BufferRecycleBin::GetBuffer (during WebM video decoding) → Large OOM in mozilla::layers::BufferRecycleBin::GetBuffer (during video decoding)
It's a bunch of work to fix this, and we've currently don't have the resources to address this immediately.
Depends on: 963922
Flags: needinfo?(cpearce)
Well in the meantime we can try to make it fallible.

BufferRecycleBin::GetBuffer just passes the result up
PlanarYCbCrImage::AllocateBuffer just passes the result up
PlanarYCbCrImage::CopyData has a null check

Does this sound reasonable? (I don't know what are the more 'global' consequences of doing this)
Attachment #8496613 - Flags: review?(bas)
(I checked the other callers too -- GetBuffer's only caller is AllocateData, and all of AllocateData's callers do a null check)
Whiteboard: [MemShrink]
Comment on attachment 8496613 [details] [diff] [review]
Fallible allocation

Review of attachment 8496613 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ImageContainer.cpp
@@ +116,5 @@
>  {
>    MutexAutoLock lock(mLock);
>  
>    if (mRecycledBuffers.IsEmpty() || mRecycledBufferSize != aSize)
> +    return new ((fallible_t())) uint8_t[aSize]; // These buffers can be large

Do all the callers here know how to deal with this returning nullptr? Also, why the double braces?
(In reply to Bas Schouten (:bas.schouten) from comment #6)
> Do all the callers here know how to deal with this returning nullptr?
Yes, at least superficially. But maybe you can tell me if there is lurking danger. 

e.g. is it OK not to (re)set mBufferSize here? http://hg.mozilla.org/mozilla-central/file/5e704397529b/gfx/layers/ImageContainer.cpp#l574
or here? http://hg.mozilla.org/mozilla-central/file/5e704397529b/gfx/layers/ImageContainer.cpp#l527
or not set mSize here? http://hg.mozilla.org/mozilla-central/file/5e704397529b/gfx/layers/basic/BasicImages.cpp#l122


> Also, why the double braces?
With single braces the compiler gets upset: error C2066: cast to function type is illegal
I took the double braces idea from: hg.mozilla.org/mozilla-central/file/5e704397529b/modules/libjar/nsZipArchive.cpp#l1160

Another possible solution is:
  fallible_t fallible;
  return new(fallible) ...
Doesn't need to be MemShrink, because it doesn't reduce memory usage per se.
Whiteboard: [MemShrink]
Comment on attachment 8496613 [details] [diff] [review]
Fallible allocation

Review of attachment 8496613 [details] [diff] [review]:
-----------------------------------------------------------------

I'd prefer it if we reset those size members when this fails. I don't think this would currently go wrong but it seems like a little bit of a footgun.
Attachment #8496613 - Flags: review?(bas) → review-
n-i in case this was forgotten.
Flags: needinfo?(dmajor)
This has fallen off my priority list. Anyone is welcome to take it from here.
Flags: needinfo?(dmajor)
This signature has ~700 crashes in the last week.  Also, note that a bunch are WebRTC video (typically frames being captured from the camera at VGA = 460800 bytes in YUV 4:2:0.)

Allocating video buffers should never be infallible - they're big, contiguous, and the system likely will be fine if they fail (though you'll glitch probably).  It *does* point to general OOM issues and likely fragmentation of remaining memory; at some point you trip over yourself and an allocation will fail or it gets so slow you might as well kill it.  Making this fallible will delay the problem; if the user doesn't kill things likely eventually they'll OOM elsewhere - but we shouldn't add to the problem.

This assumes we're not leaking them somewhere, which I assume we're not, or holding them where we don't really need them.
Can you point to a query or some crash IDs? I don't see this on any version beyond 38.0.5.
Flags: needinfo?(rjesup)
(In reply to David Major [:dmajor] from comment #13)
> Can you point to a query or some crash IDs? I don't see this on any version
> beyond 38.0.5.

Hadn't noticed that; QA had flagged they hit it in moztrap tests, but hadn't said what browser version - I assumed something newer than release.  Good!   Can we close this then?  How was it addressed?
Flags: needinfo?(rjesup)
As far as I can tell the code still looks susceptible to an OOM. Either something indirect changed, or the bug is still there and just not happening in practice.
Component: Audio/Video → Audio/Video: Playback
Crash Signature: [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int)] → [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int)] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::layers::Buff…
On Fennec this is still happening on 47, 15 times in the past week.

On Desktop it's limited to older versions. Mostly versions in the 20s and 30s, though there were 15 occurrences in 43.0.1.
[@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int)] has disappeared

[@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer] still has 340 crashes in the past week at https://crash-stats.mozilla.com/signature/?signature=OOM%20%7C%20large%20%7C%20mozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xmalloc%20%7C%20mozilla%3A%3Alayers%3A%3ABufferRecycleBin%3A%3AGetBuffer the most recent version is FennecAndroid 48.0

https://crash-stats.mozilla.com/report/index/558c1573-4dc2-47b5-a197-391352160913
Crash Signature: [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int)] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::layers::Buff… → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer]
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.