Closed
Bug 1102642
Opened 9 years ago
Closed 8 years ago
Large OOM in mozilla::MP4ContainerParser::ParseStartAndEndTimestamps
Categories
(Core :: Audio/Video, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla38
People
(Reporter: rowbot, Assigned: jya)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 2 obsolete files)
40.00 KB,
patch
|
mattwoodrow
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
196.96 KB,
text/x-log
|
Details |
While attempting to reproduce the crash in Bug 1100063, I ran into this crash. I do not have reliable steps to reproduce. All I can say is that I was doing an excessive amount of seeking in YouTube videos trying to reproduce the crash in Bug 1100063. bp-b75a193d-c509-4dbb-93ed-e172e2141120
Comment 1•8 years ago
|
||
Also reproduced this bug while seeking through YouTube videos, but no reliable STR 75e0d0a8-450a-400f-b6c6-caca82141221
Updated•8 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
![]() |
||
Comment 2•8 years ago
|
||
[Tracking Requested - why for this release]: I looked at sample of crashes from this signature - see https://crash-stats.mozilla.com/report/list?signature=OOM%20%7C%20large%20%7C%20mozalloc_abort%28char%20const%2A%20const%29%20%7C%20mozalloc_handle_oom%28unsigned%20int%29%20%7C%20moz_xmalloc%20%7C%20nsTArray_base%3CnsTArrayInfallibleAllocator%2C%20nsTArray_CopyWithMemutils%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20%7C%20nsTArray_Impl%3Cunsigned%20char%2C%20nsTArra... for more reports - and I found that they all go through mozilla::MP4ContainerParser::ParseStartAndEndTimestamps. This is #6 with 2.3% of all DevEd 36 crashes in the last 3 days, I'm pretty sure this is MSE-related.
tracking-firefox36:
--- → ?
Summary: crash [OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned int, unsigned int)] → Large OOM in mozilla::MP4ContainerParser::ParseStartAndEndTimestamps
Comment 3•8 years ago
|
||
ni on Anthony - have we landed stuff on beta yet that would account for this crash?
Flags: needinfo?(ajones)
Comment 4•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use needinfo] from comment #3) > ni on Anthony - have we landed stuff on beta yet that would account for this > crash? Sorry I meant Aurora/DevEd instead of Beta.
Ralph - anything missing from 36?
Flags: needinfo?(giles)
Comment 6•8 years ago
|
||
From the stack it looks like this area might be involved: http://hg.mozilla.org/releases/mozilla-aurora/annotate/17c6cdb0f99f/dom/media/mediasource/ContainerParser.cpp#l250 http://bit.ly/1FnZT8K shows the first small spike using Build ID 2014123000. From there there have been small spikes using Build IDs 2015010300, 2015010400 and 2015010500.
Comment 7•8 years ago
|
||
There's a fair number of MediaSource changes we haven't backported. I'll see if I can find something.
Assignee: nobody → giles
Flags: needinfo?(giles)
Updated•8 years ago
|
Flags: needinfo?(ajones)
Comment 8•8 years ago
|
||
Does this affect 37 too? Anyway, tracking for 36 as it is an important crash.
Comment 9•8 years ago
|
||
I'm going to assume that this affects 36+ until we know better.
status-firefox38:
--- → affected
tracking-firefox38:
--- → +
Comment 10•8 years ago
|
||
Ralph, this bug is now 2% of our crashes. Have you been able to find a potential fix?
Flags: needinfo?(giles)
Comment 11•8 years ago
|
||
I haven't been able to figure this one out. Everything is uplifted to beta now, so keep me posted on statistics and whether it's beta 2 or beta 3.
Flags: needinfo?(giles)
Comment 12•8 years ago
|
||
Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for an AppendBuffers call? That doesn't seem like it should be enough to OOM. Perhaps the length is getting corrupted? It looks like it's the mStream->AppendElements() in ParseStartAndEndTimestamps(), which is only called by SourceBuffer::AppendBuffer() via TrackBuffer::AppendBuffer(). If we could get a reproduction with NSPR_LOG_MODULES=MediaSource:5 we could confirm the theory.
Updated•8 years ago
|
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jyavenard)
Comment 13•8 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #12) > Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for > an AppendBuffers call? That doesn't seem like it should be enough to OOM. On 32-bit builds, especially on a 32-bit OS, 2MiB can certainly OOM. Our allocator (jemalloc) tends to leave 1MiB-sized holes behind, so you can run out of 2MiB chunks well before all the address space is used up. Our chunk cache (bug 1073662) potentially makes this worse on Windows by holding onto 1MiB chunks of address space for later reuse (which can't be used for larger allocations). Even 1MiB allocations aren't great, since jemalloc will fail if the first 1MiB-sized chunk isn't 1MiB-aligned and there are no more 2MiB chunks left (bug 1005844). It would probably still be an improvement though, especially since 1MiB allocations would benefit from the chunk cache (but note that the chunk cache is at least temporarily going away as we transition to jemalloc3, bug 762449). I don't know this code at all, so feel free to disregard, but perhaps mozilla::SegmentedVector could be used to provide some encapsulation while avoiding large allocations?
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #12) > Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for > an AppendBuffers call? That doesn't seem like it should be enough to OOM. > Perhaps the length is getting corrupted? > > It looks like it's the mStream->AppendElements() in > ParseStartAndEndTimestamps(), which is only called by > SourceBuffer::AppendBuffer() via TrackBuffer::AppendBuffer(). If we could > get a reproduction with NSPR_LOG_MODULES=MediaSource:5 we could confirm the > theory. I've seen plenty of videos / dash using 3MB segment (about 5s worth of 1080p videos). I've been testing MSE for month using this page: http://people.mozilla.org/~jyavenard/tests/mse_mp4/paper.html the media segment is 3MB.
Flags: needinfo?(jyavenard)
Comment 15•8 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #11) > I haven't been able to figure this one out. Everything is uplifted to beta > now, so keep me posted on statistics and whether it's beta 2 or beta 3. According to crash stats the large spike in these crashes occurred using 2015011412. Since that spike I don't see very many crashes in this stack on beta: http://bit.ly/1B0OpjI. The most recent beta I see has ID:20150122214638
Comment 16•8 years ago
|
||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #13) > On 32-bit builds, especially on a 32-bit OS, 2MiB can certainly OOM. Our > allocator (jemalloc) tends to leave 1MiB-sized holes behind, so you can run > out of 2MiB chunks well before all the address space is used up. Ouch. So much for infallible allocations. > I don't know this code at all, so feel free to disregard, but perhaps > mozilla::SegmentedVector could be used to provide some encapsulation while > avoiding large allocations? Doesn't look like a drop in replacement. We need random access, which could certainly be added, but I worry about the performance. See the mp4_demuxer::BufferedStream wrapper at https://dxr.mozilla.org/mozilla-central/source/media/libstagefright/binding/BufferStream.cpp#19 What about making mStream fallible here?
![]() |
||
Comment 17•8 years ago
|
||
From what I have heard in different places, we should make anything fallible that uses allocations >256k - so 1M or 2M should definitely be fallible if possible. In the end, we should try to fail gracefully rather than crash on large videos, I guess. ;-)
Comment 18•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use needinfo] from comment #15) > According to crash stats the large spike in these crashes occurred using > 2015011412. Since that spike I don't see very many crashes in this stack on > beta: http://bit.ly/1B0OpjI. The most recent beta I see has > ID:20150122214638 I'd certainly like to believe the last two weeks of uplift fixed this, but Sylvestre said the opposite in #10, and that ID sounds 36b3? We'll know more after the breakpad bump from bug 1124892 lands.
Comment 19•8 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17) > From what I have heard in different places, we should make anything fallible > that uses allocations >256k I supposed I'd heard that before too. It's just not reasonable to call ~MB allocations 'large' where video is concerned.
Comment 20•8 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #18) > (In reply to Marcia Knous [:marcia - use needinfo] from comment #15) > > > According to crash stats the large spike in these crashes occurred using > > 2015011412. Since that spike I don't see very many crashes in this stack on > > beta: http://bit.ly/1B0OpjI. The most recent beta I see has > > ID:20150122214638 > > I'd certainly like to believe the last two weeks of uplift fixed this, but > Sylvestre said the opposite in #10, and that ID sounds 36b3? We'll know more > after the breakpad bump from bug 1124892 lands. http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/36.0b1-candidates/build2/win32_info.txt shows the build ID 2015011412 was beta 1, not beta 3. So this may no longer be an issue in the betas.
Comment 21•8 years ago
|
||
Oh, I see. You're saying you _haven't_ seen crashes with 20150122214638. Quoth Bobby Holly on email, "I don't think we need to reproduce this per se - we just need to do fallible allocation for any buffers whose size is controlled by JS (this is a standard rule), and certainly any buffer in the hundreds of k or larger."
Comment 23•8 years ago
|
||
Comment on attachment 8554125 [details] [diff] [review] Make mp4_demuxer::BufferReader fallible I get instant crashes on youtube with this. Don't have time to debug tonight.
Attachment #8554125 -
Flags: review?(bobbyholley)
Comment 24•8 years ago
|
||
Comment on attachment 8554125 [details] [diff] [review] Make mp4_demuxer::BufferReader fallible Review of attachment 8554125 [details] [diff] [review]: ----------------------------------------------------------------- I think we need to change the call signature of ParseStartAndEndTimestamps to disambiguate between "init segment not there" and "error that we should throw for". The API is pretty ugly right now, but the simplest possible solution is probably to just pass a bool* that the callee can set in the error cases, and that the caller can use to propagate a DOMException. This only covers high-level init segment parsing. The other thing we need to make fallible is ResourceQueue::AppendItem (and callers), which covers the regular SourceBuffer append case. That could probably go in a separate patch though.
Attachment #8554125 -
Flags: review-
Comment 25•8 years ago
|
||
That will conflict with jya's patches to make us run the buffer append algorithm asynchronously. The spec has explicit handling for insufficient memory during the prepare append algorithm (throw a QUOTA_EXCEEDED_ERR exception), but not for the same during the buffer append algorithm. I think that means we need to allocate all memory we're going to use during the prepare append algorithm, so that we know we have enough to run the asynchronous part. We could also just use the append error algorithm and put the MediaSource into an error state, this is the expected behaviour when running out of memory during a stream append.
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #25) > That will conflict with jya's patches to make us run the buffer append > algorithm asynchronously. > > The spec has explicit handling for insufficient memory during the prepare > append algorithm (throw a QUOTA_EXCEEDED_ERR exception), but not for the > same during the buffer append algorithm. > > I think that means we need to allocate all memory we're going to use during > the prepare append algorithm, so that we know we have enough to run the > asynchronous part. I was thinking the exact same thing yesterday as I was going over the async append patch once again (bug 1118589). With that patch the OOM would now happen right when the AppendDataRunnable is created which is not good. I was thinking that the only way to prevent a OOM down the chain, would be to allocate the memory in PrepareAppend, and store that in something similar to the mp4_demuxer::ByteBuffer which is a refcounted array. And subsequent usage of the AppendBuffer and further would be to re-use/share that data including the internal nsQueue data used by the SourceBufferResource. Even if we check for OOM when allocating in PrepareAppend, there's no guarantee there won't be more further down the track as right now, for each appendBuffer the data appended is temporarily duplicated at least three times later.
Assignee | ||
Comment 27•8 years ago
|
||
This is what I had in mind. Not fully functional yet as I haven't completed ParseStartAndEndTimestamps BufferStream conversion into MP4Stream, (the MediaSourceResource doesn't allow to remove partial data). All buffers are now allocated just once, and then shared; nothing is duplicated
Assignee | ||
Updated•8 years ago
|
Assignee: giles → jyavenard
Status: NEW → ASSIGNED
Comment 29•8 years ago
|
||
Adding a new signature. This crash is reproducible in my Windows 8 lab machine.
Assignee | ||
Comment 30•8 years ago
|
||
Use a ref counted, faillible TArray to store our data. Only MP4 is making full use of it at this stage.
Attachment #8554915 -
Flags: review?(matt.woodrow)
Assignee | ||
Updated•8 years ago
|
Attachment #8554546 -
Attachment is obsolete: true
Updated•8 years ago
|
Crash Signature: , nsTArra... ] → , nsTArra... ]
[@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned int, unsigned int) | nsTArray_Impl<bool, …
Updated•8 years ago
|
Attachment #8554915 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 31•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=748c4aaa4bbd
Comment 32•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use needinfo] from comment #29) > Adding a new signature. This crash is reproducible in my Windows 8 lab > machine. Can you see memory usage climbing before we crash? Could you try getting an about:memory report when the usage is high?
Flags: needinfo?(mozillamarcia.knous)
Assignee | ||
Comment 33•8 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7b87042f254 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/0e9b37fd47c6
Assignee | ||
Comment 34•8 years ago
|
||
Missed an explicit declaration: Added quickly in this push https://hg.mozilla.org/integration/mozilla-inbound/rev/d72836297b6b
Comment 35•8 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b7b87042f254 https://hg.mozilla.org/mozilla-central/rev/d72836297b6b
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment 36•8 years ago
|
||
The addition of #include "MP4Stream.h" to ContainerParser.cpp in this patch is causing Lollipop build bustage ("MP4Stream.h: No such file or directory"). Filed bug 1126499 for this.
Updated•8 years ago
|
Attachment #8554125 -
Attachment is obsolete: true
Comment 37•8 years ago
|
||
Comment on attachment 8554915 [details] [diff] [review] Use ref counted compressed data within mediasource Approval Request Comment [Feature/regressing bug #]: MSE [User impact if declined]: Crashes playing YouTube video. [Describe test coverage new/current, TreeHerder]: Landed on m-c. [Risks and why]: Almost entirely MSE-specific. Though not small. [String/UUID change made/needed]: None. NB this one may need some rebase love.
Attachment #8554915 -
Flags: approval-mozilla-beta?
Attachment #8554915 -
Flags: approval-mozilla-aurora?
Updated•8 years ago
|
Attachment #8554915 -
Flags: approval-mozilla-beta?
Attachment #8554915 -
Flags: approval-mozilla-beta+
Attachment #8554915 -
Flags: approval-mozilla-aurora?
Attachment #8554915 -
Flags: approval-mozilla-aurora+
Comment 38•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/8de0be1741d4 https://hg.mozilla.org/releases/mozilla-beta/rev/8b4f59c3ae71
Comment 39•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #32) > (In reply to Marcia Knous [:marcia - use needinfo] from comment #29) > > Adding a new signature. This crash is reproducible in my Windows 8 lab > > machine. > > Can you see memory usage climbing before we crash? > > Could you try getting an about:memory report when the usage is high? I let the machine run overnight with logging - hope to have something this morning when I get in.
Comment 40•8 years ago
|
||
Here is a memory log which was run while the video was playing overnight. May not help, but since the crash is reproducible on this machine.
Flags: needinfo?(mozillamarcia.knous)
Comment 41•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use needinfo] from comment #40) > Created attachment 8556009 [details] > Memory log captured overnight while video was playing > > Here is a memory log which was run while the video was playing overnight. > May not help, but since the crash is reproducible on this machine. This looks like a cycle collector log, which probably doesn't help here because these media buffers aren't in the cycle collection graph. The more interesting data would be the reports in about:memory.
Updated•8 years ago
|
Flags: qe-verify+
![]() |
||
Comment 42•8 years ago
|
||
(The messed up signature is this bug in disguise, due to bug 1124892)
Crash Signature: , nsTArrayInfallib...] → , nsTArrayInfallib...]
[@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | xul.dll@0x278d96 | xul.dll@0x7bfa98 | xul.dll@0x10f419a | xul.dll@0x10f0227 | xul.dll@0x10f004a | xul.dll@0x1e8527 | xul.dll@0x1…
Crash Signature: xul.dll@0x10867e | xul.dll@0x286684...] → xul.dll@0x10867e | xul.dll@0x286684...]
[@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | xul.dll@0x26add6 | xul.dll@0x1150ab4 | xul.dll@0x10bcaa3 | xul.dll@0x10b8be6 | xul.dll@0x10b8a09 | xul.dll@0x2…
Comment 43•8 years ago
|
||
The crash rates over the past two weeks are the following: - 38.0a1 Nightly: 3 crashes - 37.0a2 Aurora: 9 crashes - 36.0b6: 4 crashes (down from 601 crashes in 36.0b4). Given the low crash rates I consider it's now safe to close this issue.
Status: RESOLVED → VERIFIED
QA Contact: cornel.ionce
You need to log in
before you can comment on or make changes to this bug.
Description
•