1102642 - Large OOM in mozilla::MP4ContainerParser::ParseStartAndEndTimestamps

Reporter

Description

•

11 years ago

While attempting to reproduce the crash in Bug 1100063, I ran into this crash. I do not have reliable steps to reproduce. All I can say is that I was doing an excessive amount of seeking in YouTube videos trying to reproduce the crash in Bug 1100063. bp-b75a193d-c509-4dbb-93ed-e172e2141120

Damian Shaw [Quan]

Comment 1

•

11 years ago

Also reproduced this bug while seeking through YouTube videos, but no reliable STR 75e0d0a8-450a-400f-b6c6-caca82141221

Damian Shaw [Quan]

Updated

•

11 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Robert Kaiser

Comment 2

•

11 years ago

[Tracking Requested - why for this release]: I looked at sample of crashes from this signature - see https://crash-stats.mozilla.com/report/list?signature=OOM%20%7C%20large%20%7C%20mozalloc_abort%28char%20const%2A%20const%29%20%7C%20mozalloc_handle_oom%28unsigned%20int%29%20%7C%20moz_xmalloc%20%7C%20nsTArray_base%3CnsTArrayInfallibleAllocator%2C%20nsTArray_CopyWithMemutils%3E%3A%3AEnsureCapacity%28unsigned%20int%2C%20unsigned%20int%29%20%7C%20nsTArray_Impl%3Cunsigned%20char%2C%20nsTArra... for more reports - and I found that they all go through mozilla::MP4ContainerParser::ParseStartAndEndTimestamps. This is #6 with 2.3% of all DevEd 36 crashes in the last 3 days, I'm pretty sure this is MSE-related.

tracking-firefox36: --- → ?

Summary: crash [OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned int, unsigned int)] → Large OOM in mozilla::MP4ContainerParser::ParseStartAndEndTimestamps

Marcia Knous [:marcia]

Comment 3

•

11 years ago

ni on Anthony - have we landed stuff on beta yet that would account for this crash?

Flags: needinfo?(ajones)

Marcia Knous [:marcia]

Comment 4

•

11 years ago

(In reply to Marcia Knous [:marcia - use needinfo] from comment #3) > ni on Anthony - have we landed stuff on beta yet that would account for this > crash? Sorry I meant Aurora/DevEd instead of Beta.

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 5

•

11 years ago

Ralph - anything missing from 36?

Flags: needinfo?(giles)

Marcia Knous [:marcia]

Comment 6

•

11 years ago

From the stack it looks like this area might be involved: http://hg.mozilla.org/releases/mozilla-aurora/annotate/17c6cdb0f99f/dom/media/mediasource/ContainerParser.cpp#l250 http://bit.ly/1FnZT8K shows the first small spike using Build ID 2014123000. From there there have been small spikes using Build IDs 2015010300, 2015010400 and 2015010500.

Ralph Giles (:rillian)

Comment 7

•

11 years ago

There's a fair number of MediaSource changes we haven't backported. I'll see if I can find something.

Assignee: nobody → giles

Flags: needinfo?(giles)

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Updated

•

11 years ago

Flags: needinfo?(ajones)

Sylvestre Ledru [:Sylvestre]

Comment 8

•

11 years ago

Does this affect 37 too? Anyway, tracking for 36 as it is an important crash.

status-firefox36: --- → affected

status-firefox37: --- → ?

tracking-firefox36: ? → +

tracking-firefox37: --- → ?

Robert Kaiser

Updated

•

11 years ago

Keywords: crash

Lawrence Mandel [:lmandel] (use needinfo)

Comment 9

•

11 years ago

I'm going to assume that this affects 36+ until we know better.

status-firefox37: ? → affected

status-firefox38: --- → affected

tracking-firefox37: ? → +

tracking-firefox38: --- → +

Sylvestre Ledru [:Sylvestre]

Comment 10

•

11 years ago

Ralph, this bug is now 2% of our crashes. Have you been able to find a potential fix?

Flags: needinfo?(giles)

Ralph Giles (:rillian)

Comment 11

•

11 years ago

I haven't been able to figure this one out. Everything is uplifted to beta now, so keep me posted on statistics and whether it's beta 2 or beta 3.

Flags: needinfo?(giles)

Ralph Giles (:rillian)

Comment 12

•

11 years ago

Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for an AppendBuffers call? That doesn't seem like it should be enough to OOM. Perhaps the length is getting corrupted? It looks like it's the mStream->AppendElements() in ParseStartAndEndTimestamps(), which is only called by SourceBuffer::AppendBuffer() via TrackBuffer::AppendBuffer(). If we could get a reproduction with NSPR_LOG_MODULES=MediaSource:5 we could confirm the theory.

Ralph Giles (:rillian)

Updated

•

11 years ago

Blocks: MSE

Priority: -- → P1

Ralph Giles (:rillian)

Updated

•

11 years ago

Flags: needinfo?(matt.woodrow)

Flags: needinfo?(jyavenard)

Emanuel Hoogeveen [:ehoogeveen]

Comment 13

•

11 years ago

(In reply to Ralph Giles (:rillian) from comment #12) > Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for > an AppendBuffers call? That doesn't seem like it should be enough to OOM. On 32-bit builds, especially on a 32-bit OS, 2MiB can certainly OOM. Our allocator (jemalloc) tends to leave 1MiB-sized holes behind, so you can run out of 2MiB chunks well before all the address space is used up. Our chunk cache (bug 1073662) potentially makes this worse on Windows by holding onto 1MiB chunks of address space for later reuse (which can't be used for larger allocations). Even 1MiB allocations aren't great, since jemalloc will fail if the first 1MiB-sized chunk isn't 1MiB-aligned and there are no more 2MiB chunks left (bug 1005844). It would probably still be an improvement though, especially since 1MiB allocations would benefit from the chunk cache (but note that the chunk cache is at least temporarily going away as we transition to jemalloc3, bug 762449). I don't know this code at all, so feel free to disregard, but perhaps mozilla::SegmentedVector could be used to provide some encapsulation while avoiding large allocations?

Jean-Yves Avenard [:jya]

Assignee

Comment 14

•

11 years ago

(In reply to Ralph Giles (:rillian) from comment #12) > Jya, Matt, do you have any ideas here? Does ~2M seem unreasonably large for > an AppendBuffers call? That doesn't seem like it should be enough to OOM. > Perhaps the length is getting corrupted? > > It looks like it's the mStream->AppendElements() in > ParseStartAndEndTimestamps(), which is only called by > SourceBuffer::AppendBuffer() via TrackBuffer::AppendBuffer(). If we could > get a reproduction with NSPR_LOG_MODULES=MediaSource:5 we could confirm the > theory. I've seen plenty of videos / dash using 3MB segment (about 5s worth of 1080p videos). I've been testing MSE for month using this page: http://people.mozilla.org/~jyavenard/tests/mse_mp4/paper.html the media segment is 3MB.

Flags: needinfo?(jyavenard)

Marcia Knous [:marcia]

Comment 15

•

11 years ago

(In reply to Ralph Giles (:rillian) from comment #11) > I haven't been able to figure this one out. Everything is uplifted to beta > now, so keep me posted on statistics and whether it's beta 2 or beta 3. According to crash stats the large spike in these crashes occurred using 2015011412. Since that spike I don't see very many crashes in this stack on beta: http://bit.ly/1B0OpjI. The most recent beta I see has ID:20150122214638

Ralph Giles (:rillian)

Comment 16

•

11 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #13) > On 32-bit builds, especially on a 32-bit OS, 2MiB can certainly OOM. Our > allocator (jemalloc) tends to leave 1MiB-sized holes behind, so you can run > out of 2MiB chunks well before all the address space is used up. Ouch. So much for infallible allocations. > I don't know this code at all, so feel free to disregard, but perhaps > mozilla::SegmentedVector could be used to provide some encapsulation while > avoiding large allocations? Doesn't look like a drop in replacement. We need random access, which could certainly be added, but I worry about the performance. See the mp4_demuxer::BufferedStream wrapper at https://dxr.mozilla.org/mozilla-central/source/media/libstagefright/binding/BufferStream.cpp#19 What about making mStream fallible here?

Robert Kaiser

Comment 17

•

11 years ago

From what I have heard in different places, we should make anything fallible that uses allocations >256k - so 1M or 2M should definitely be fallible if possible. In the end, we should try to fail gracefully rather than crash on large videos, I guess. ;-)

Ralph Giles (:rillian)

Comment 18

•

11 years ago

(In reply to Marcia Knous [:marcia - use needinfo] from comment #15) > According to crash stats the large spike in these crashes occurred using > 2015011412. Since that spike I don't see very many crashes in this stack on > beta: http://bit.ly/1B0OpjI. The most recent beta I see has > ID:20150122214638 I'd certainly like to believe the last two weeks of uplift fixed this, but Sylvestre said the opposite in #10, and that ID sounds 36b3? We'll know more after the breakpad bump from bug 1124892 lands.

Ralph Giles (:rillian)

Comment 19

•

11 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17) > From what I have heard in different places, we should make anything fallible > that uses allocations >256k I supposed I'd heard that before too. It's just not reasonable to call ~MB allocations 'large' where video is concerned.

Marcia Knous [:marcia]

Comment 20

•

11 years ago

(In reply to Ralph Giles (:rillian) from comment #18) > (In reply to Marcia Knous [:marcia - use needinfo] from comment #15) > > > According to crash stats the large spike in these crashes occurred using > > 2015011412. Since that spike I don't see very many crashes in this stack on > > beta: http://bit.ly/1B0OpjI. The most recent beta I see has > > ID:20150122214638 > > I'd certainly like to believe the last two weeks of uplift fixed this, but > Sylvestre said the opposite in #10, and that ID sounds 36b3? We'll know more > after the breakpad bump from bug 1124892 lands. http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/36.0b1-candidates/build2/win32_info.txt shows the build ID 2015011412 was beta 1, not beta 3. So this may no longer be an issue in the betas.

Ralph Giles (:rillian)

Comment 21

•

11 years ago

Oh, I see. You're saying you _haven't_ seen crashes with 20150122214638. Quoth Bobby Holly on email, "I don't think we need to reproduce this per se - we just need to do fallible allocation for any buffers whose size is controlled by JS (this is a standard rule), and certainly any buffer in the hundreds of k or larger."

Ralph Giles (:rillian)

Comment 22

•

11 years ago

Attached patch Make mp4_demuxer::BufferReader fallible (obsolete) — Details — Splinter Review

Like this?

Attachment #8554125 - Flags: review?(bobbyholley)

Ralph Giles (:rillian)

Comment 23

•

11 years ago

Comment on attachment 8554125 [details] [diff] [review] Make mp4_demuxer::BufferReader fallible I get instant crashes on youtube with this. Don't have time to debug tonight.

Attachment #8554125 - Flags: review?(bobbyholley)

Bobby Holley (:bholley)

Comment 24

•

11 years ago

Comment on attachment 8554125 [details] [diff] [review] Make mp4_demuxer::BufferReader fallible Review of attachment 8554125 [details] [diff] [review]: ----------------------------------------------------------------- I think we need to change the call signature of ParseStartAndEndTimestamps to disambiguate between "init segment not there" and "error that we should throw for". The API is pretty ugly right now, but the simplest possible solution is probably to just pass a bool* that the callee can set in the error cases, and that the caller can use to propagate a DOMException. This only covers high-level init segment parsing. The other thing we need to make fallible is ResourceQueue::AppendItem (and callers), which covers the regular SourceBuffer append case. That could probably go in a separate patch though.

Attachment #8554125 - Flags: review-

Matt Woodrow (:mattwoodrow)

Comment 25

•

11 years ago

That will conflict with jya's patches to make us run the buffer append algorithm asynchronously. The spec has explicit handling for insufficient memory during the prepare append algorithm (throw a QUOTA_EXCEEDED_ERR exception), but not for the same during the buffer append algorithm. I think that means we need to allocate all memory we're going to use during the prepare append algorithm, so that we know we have enough to run the asynchronous part. We could also just use the append error algorithm and put the MediaSource into an error state, this is the expected behaviour when running out of memory during a stream append.

Flags: needinfo?(matt.woodrow)

Jean-Yves Avenard [:jya]

Assignee

Comment 26

•

11 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #25) > That will conflict with jya's patches to make us run the buffer append > algorithm asynchronously. > > The spec has explicit handling for insufficient memory during the prepare > append algorithm (throw a QUOTA_EXCEEDED_ERR exception), but not for the > same during the buffer append algorithm. > > I think that means we need to allocate all memory we're going to use during > the prepare append algorithm, so that we know we have enough to run the > asynchronous part. I was thinking the exact same thing yesterday as I was going over the async append patch once again (bug 1118589). With that patch the OOM would now happen right when the AppendDataRunnable is created which is not good. I was thinking that the only way to prevent a OOM down the chain, would be to allocate the memory in PrepareAppend, and store that in something similar to the mp4_demuxer::ByteBuffer which is a refcounted array. And subsequent usage of the AppendBuffer and further would be to re-use/share that data including the internal nsQueue data used by the SourceBufferResource. Even if we check for OOM when allocating in PrepareAppend, there's no guarantee there won't be more further down the track as right now, for each appendBuffer the data appended is temporarily duplicated at least three times later.

Jean-Yves Avenard [:jya]

Assignee

Comment 27

•

11 years ago

Attached patch Use shared data (obsolete) — Details — Splinter Review

This is what I had in mind. Not fully functional yet as I haven't completed ParseStartAndEndTimestamps BufferStream conversion into MP4Stream, (the MediaSourceResource doesn't allow to remove partial data). All buffers are now allocated just once, and then shared; nothing is duplicated

Jean-Yves Avenard [:jya]

Assignee

Updated

•

11 years ago

Assignee: giles → jyavenard

Status: NEW → ASSIGNED

Marcia Knous [:marcia]

Comment 29

•

11 years ago

Adding a new signature. This crash is reproducible in my Windows 8 lab machine.

Jean-Yves Avenard [:jya]

Assignee

Comment 30

•

11 years ago

Attached patch Use ref counted compressed data within mediasource — Details — Splinter Review

Use a ref counted, faillible TArray to store our data. Only MP4 is making full use of it at this stage.

Attachment #8554915 - Flags: review?(matt.woodrow)

Jean-Yves Avenard [:jya]

Assignee

Updated

•

11 years ago

Attachment #8554546 - Attachment is obsolete: true

Marcia Knous [:marcia]

Updated

•

11 years ago

Matt Woodrow (:mattwoodrow)

Updated

•

11 years ago

Attachment #8554915 - Flags: review?(matt.woodrow) → review+

Jean-Yves Avenard [:jya]

Assignee

Comment 31

•

11 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=748c4aaa4bbd

Matt Woodrow (:mattwoodrow)

Comment 32

•

11 years ago

(In reply to Marcia Knous [:marcia - use needinfo] from comment #29) > Adding a new signature. This crash is reproducible in my Windows 8 lab > machine. Can you see memory usage climbing before we crash? Could you try getting an about:memory report when the usage is high?

Flags: needinfo?(mozillamarcia.knous)

Jean-Yves Avenard [:jya]

Assignee

Comment 33

•

11 years ago

remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7b87042f254 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/0e9b37fd47c6

Jean-Yves Avenard [:jya]

Assignee

Comment 34

•

11 years ago

Missed an explicit declaration: Added quickly in this push https://hg.mozilla.org/integration/mozilla-inbound/rev/d72836297b6b

Carsten Book [:Tomcat]

Comment 35

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/b7b87042f254 https://hg.mozilla.org/mozilla-central/rev/d72836297b6b

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

status-firefox38: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

Botond Ballo [:botond] [away until Jun 29]

Updated

•

11 years ago

Depends on: 1126499

Botond Ballo [:botond] [away until Jun 29]

Comment 36

•

11 years ago

The addition of #include "MP4Stream.h" to ContainerParser.cpp in this patch is causing Lollipop build bustage ("MP4Stream.h: No such file or directory"). Filed bug 1126499 for this.

Ralph Giles (:rillian)

Updated

•

11 years ago

Attachment #8554125 - Attachment is obsolete: true

Ralph Giles (:rillian)

Comment 37

•

11 years ago

Comment on attachment 8554915 [details] [diff] [review] Use ref counted compressed data within mediasource Approval Request Comment [Feature/regressing bug #]: MSE [User impact if declined]: Crashes playing YouTube video. [Describe test coverage new/current, TreeHerder]: Landed on m-c. [Risks and why]: Almost entirely MSE-specific. Though not small. [String/UUID change made/needed]: None. NB this one may need some rebase love.

Attachment #8554915 - Flags: approval-mozilla-beta?

Attachment #8554915 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

Attachment #8554915 - Flags: approval-mozilla-beta?

Attachment #8554915 - Flags: approval-mozilla-beta+

Attachment #8554915 - Flags: approval-mozilla-aurora?

Attachment #8554915 - Flags: approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 38

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/8de0be1741d4 https://hg.mozilla.org/releases/mozilla-beta/rev/8b4f59c3ae71

status-firefox36: affected → fixed

status-firefox37: affected → fixed

Marcia Knous [:marcia]

Comment 39

•

11 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #32) > (In reply to Marcia Knous [:marcia - use needinfo] from comment #29) > > Adding a new signature. This crash is reproducible in my Windows 8 lab > > machine. > > Can you see memory usage climbing before we crash? > > Could you try getting an about:memory report when the usage is high? I let the machine run overnight with logging - hope to have something this morning when I get in.

Marcia Knous [:marcia]

Comment 40

•

11 years ago

Attached file Memory log captured overnight while video was playing — Details

Here is a memory log which was run while the video was playing overnight. May not help, but since the crash is reproducible on this machine.

Flags: needinfo?(mozillamarcia.knous)

Bobby Holley (:bholley)

Comment 41

•

11 years ago

(In reply to Marcia Knous [:marcia - use needinfo] from comment #40) > Created attachment 8556009 [details] > Memory log captured overnight while video was playing > > Here is a memory log which was run while the video was playing overnight. > May not help, but since the crash is reproducible on this machine. This looks like a cycle collector log, which probably doesn't help here because these media buffers aren't in the cycle collection graph. The more interesting data would be the reports in about:memory.

[SV Manager] Florin Mezei, QA (:FlorinMezei)

Updated

•

11 years ago

Flags: qe-verify+

(Away)

Comment 42

•

11 years ago

(The messed up signature is this bug in disguise, due to bug 1124892)

Michael Vines [:m1] [:evilmachines]

Updated

•

11 years ago

Updated

•

11 years ago

Cornel Ionce [:noni]

Comment 43

•

11 years ago

The crash rates over the past two weeks are the following: - 38.0a1 Nightly: 3 crashes - 37.0a2 Aurora: 9 crashes - 36.0b6: 4 crashes (down from 601 crashes in 36.0b4). Given the low crash rates I consider it's now safe to close this issue.

Status: RESOLVED → VERIFIED

status-firefox36: fixed → verified

status-firefox37: fixed → verified

status-firefox38: fixed → verified

QA Contact: cornel.ionce

Make mp4_demuxer::BufferReader fallible 11 years ago Ralph Giles (:rillian) 2.32 KB, patch	bholley : review-	Details \| Diff \| Splinter Review
Use shared data 11 years ago Jean-Yves Avenard [:jya] 39.67 KB, patch		Details \| Diff \| Splinter Review
Use ref counted compressed data within mediasource 11 years ago Jean-Yves Avenard [:jya] 40.00 KB, patch	mattwoodrow : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review
Memory log captured overnight while video was playing 11 years ago Marcia Knous [:marcia] 196.96 KB, text/x-log		Details