Closed Bug 1127173 Opened 9 years ago Closed 9 years ago

crash in OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | stagefright::MPEG4Source::start(stagefright::MetaData*)

Categories

(Core :: Audio/Video, defect, P1)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: away, Assigned: jya)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-23df00d6-e4e8-449a-8542-403c22150125.
=============================================================

This is a top crash in early data from 36 beta 4. (Hard to say about earlier v36 betas due to bug 1124892) Most URLs are youtube.

Most of these OOMs are 3110400 bytes. On a 32-bit Firefox we can't expect to have a contiguous block of that size after long uptime.

Bug 1066319 fixed some instances that had bogusly-large allocations, but these 3MB allocations look legitimate so I'm opening a new bug. :rillian, since you fixed the previous bug, can you take this one or recommend an owner?
Flags: needinfo?(giles)
3110400 bytes is a 1080p yuv frame, which at least narrows this down a bit. We discussed this today and are trying to get a reproduction we can test against. Unfortunately we can't make every allocation in the html video playback code fallible in time for 36. Is this worse than the flash crashes we've been getting from youtube on 35?

Jean-Yves, do you mind taking a look at the stack trace? Maybe there's something simple we can make fallible to reduce the incidence.
Blocks: MSE
Flags: needinfo?(giles) → needinfo?(jyavenard)
Priority: -- → P1
This is the stagefright demuxer pre-allocating the maximum size it could ever read. This is super unlikely to ever need that size.
That buffer is used to read whole NAL and parse them in RAM.

Could allocate a much smaller block and dynamically resize that buffer as we encounter bigger data (but never more than the 3MB currently in place).

That size is defined in the stsz atom, which if it doesn't have one is:
ALOGE("No width or height, assuming worst case 1080p");
mLastTrack->meta->setInt32(kKeyMaxInputSize, 3110400);

That size is used if we haven't parsed the dimension yet. Otherwise it is set to the size of a YUV420 uncompressed frame with those dimensions.

Having said that, I think the core reason for those crashes have been found in bug 1127122
Flags: needinfo?(jyavenard)
(In reply to Ralph Giles (:rillian) from comment #1)
> Is this worse than the flash crashes
> we've been getting from youtube on 35?

Yes. I haven't checked volume, but on Flash crashes only the plugin process goes away, but on crashes in our own code, all of Firefox crashes, which is definitely worse.
What is the exact problem we're seeing?

1. We're using too much memory and 3MB is pushing us over some boundary.
2. We've fragmented memory/VM space so much that we can't allocate 3MB.

If it's #1, can we spend additional time understanding the memory usage (and do we have about:memory dumps to help with that?) The 3MB allocation itself isn't especially unbounded nor something that sounds easy to fix.
If it's #2, did MSE/youtube cause this problem to get worse? Are there memory usage patterns that we should be trying to fix specifically related to video/EME usage?
Flags: needinfo?(dmajor)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> What is the exact problem we're seeing?
> 
> 1. We're using too much memory and 3MB is pushing us over some boundary.
> 2. We've fragmented memory/VM space so much that we can't allocate 3MB.

The proximate cause is inability to allocate a 3MB contiguous block. In some cases, it may be exacerbated by memory usage (some of these reports have high write-combine usage like bug 1062065).

We should go after both causes. Even if we fix the overall-memory-usage regressions, the rest of the codebase accommodates the fact that allocations over 1MB may fail. These allocations will stick out in crash reports until that happens.
Flags: needinfo?(dmajor)
Oh, I forgot to answer this bit:
> If it's #2, did MSE/youtube cause this problem to get worse?
My gut feeling is that this code area is just newer and hasn't yet worked out this class of bug.
I just hit this crash in my lab running on a Lenova X1 Carbon - https://crash-stats.mozilla.com/report/index/bp-f3831f2a-9c28-4c5d-8981-d83ce2150130. I will try to reproduce and grab a memory log if that helps.
Marcia's crash dump shows an alarming 2599MB of graphics memory. That machine is likely running into the issue seen in bug 1123465/bug 1127925.

I'm really starting to worry about the interaction between MSE and the gfx issues in 36. The two may be making each other worse. Even if we fixed this bug, Marcia would have likely just OOMed elsewhere (not to say that we shouldn't still fix this bug).
QA Contact: jyavenard
QA Contact: jyavenard
Assignee: nobody → jyavenard
This bug will be fixed once bug 1128410 lands.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I can confirm that it's gone in 36.0b6.
You need to log in before you can comment on or make changes to this bug.