Crash in alloc::oom::oom

VERIFIED FIXED in Firefox 56

Status

()

Core
Audio/Video: Playback
P1
critical
VERIFIED FIXED
2 months ago
a month ago

People

(Reporter: calixte, Assigned: alfredo)

Tracking

({crash, regression, topcrash-win})

56 Branch
mozilla57
x86
Windows 7
crash, regression, topcrash-win
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 unaffected, firefox56blocking verified, firefox57 verified)

Details

(crash signature)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

2 months ago
This bug was filed from the Socorro interface and is 
report bp-abe8ff63-50b2-463e-ba75-317740170811.
=============================================================

There 40 crashes in beta 56.0b1 from 37 installations.
:alfredo, could you investigate please ?
Flags: needinfo?(ayang)
(Assignee)

Comment 1

2 months ago
mp4parse_indice is a vector which keeps the sample data of a mp4 stream. In mp4 demuxer, the max length is about 7 days long video [1].
So the maximum memory consumption of mp4parse_indice table is about 800MB. The crash reports seem all happened on a 32 bit windows machines with 2G memory space. And the problem occurrs when the vector tries to double itself to extend size.

Hi, kinetik,
I can reduce the table size if it is on a 32 bit machine by something like 'target_pointer_width' [2].
Any suggestion?


[1] https://dxr.mozilla.org/mozilla-central/rev/5322c03f4c8587fe526172d3f87160031faa6d75/media/libstagefright/binding/mp4parse/src/lib.rs#33
[2] https://doc.rust-lang.org/reference/attributes.html#conditional-compilation
Flags: needinfo?(ayang) → needinfo?(kinetik)
(Assignee)

Updated

2 months ago
Assignee: nobody → ayang
(Assignee)

Comment 2

2 months ago
https://github.com/mozilla/mp4parse-rust/pull/109
(In reply to Alfredo Yang (:alfredo) from comment #1)
> I can reduce the table size if it is on a 32 bit machine by something like
> 'target_pointer_width' [2].
> Any suggestion?

I think long term we want fallible allocation so we can handle OOM gracefully.  There's some discussion about this among the Rust language people (https://internals.rust-lang.org/t/notes-interviews-on-fallible-collection-allocations/5619 is the latest, I think), but I don't know how far away a solution is.  It's possible we could find a fallible Vec crate and switch to that (or implement our own), if this continues to be a problem with the adjusted limit.

For a short term fix, adjusting the limit based on the architecture seems reasonable.
Flags: needinfo?(kinetik)
Another option is to avoid these potentially large allocations entirely by changing the interface to a (streaming) iterator based one, rather than returning the complete table.  This would require changes on the Gecko side, and I haven't investigated how feasible it is, but it may be worth considering.
Comment hidden (mozreview-request)
(Assignee)

Comment 6

2 months ago
(In reply to Matthew Gregan [:kinetik] from comment #4)
> Another option is to avoid these potentially large allocations entirely by
> changing the interface to a (streaming) iterator based one, rather than
> returning the complete table.  This would require changes on the Gecko side,
> and I haven't investigated how feasible it is, but it may be worth
> considering.

Yeah, I'll investigate it later. Thank you.

Comment 7

2 months ago
mozreview-review
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

https://reviewboard.mozilla.org/r/168620/#review173866
Attachment #8897316 - Flags: review?(kinetik) → review+

Comment 8

2 months ago
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/086e30b2dfab
reduce table size on 32bit arch to avoid OOM. r=kinetik

Comment 9

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/086e30b2dfab
Status: NEW → RESOLVED
Last Resolved: 2 months ago
status-firefox57: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please request Beta approval on this when you're comfortable doing so.
Flags: needinfo?(ayang)
(Assignee)

Comment 11

2 months ago
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

Approval Request Comment
[Feature/Bug causing the regression]: mp4 sample table too large causes OOM
[User impact if declined]: OOM if user plays a very long mp4 video
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: please play a 24hr long mp4 video, it should be rejected at 32bit win7 without OOM.
[List of other uplifts needed for the feature/fix]: no
[Is the change risky?]: no, it only affects on 32bit win7.
[Why is the change risky/not risky?]: it reduces the table size only, from 7 days long mp4 video to 1 day.
[String changes made/needed]: none
Flags: needinfo?(ayang)
Attachment #8897316 - Flags: approval-mozilla-beta?
Hi Florin, could you help find someone to verify if this issue was fixed as expected on the latest Nightly build? Thanks!
Flags: qe-verify+
Flags: needinfo?(florin.mezei)
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

Avoiding an OOM sounds good, let's take this for beta 5. Verification would still be nice but I don't think it's necessary before uplift.
Attachment #8897316 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 14

2 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/743f2fe4e06f
status-firefox56: affected → fixed
Flags: in-testsuite+
(Assignee)

Updated

2 months ago
Duplicate of this bug: 1392375
(Assignee)

Comment 16

2 months ago
Unfortunately, it doesn't fix it.

https://bugzilla.mozilla.org/show_bug.cgi?id=1392375#c4
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Gerry Chang [:gchang] from comment #12)
> Hi Florin, could you help find someone to verify if this issue was fixed as
> expected on the latest Nightly build? Thanks!

Removing needinfo since it still reproduces based on the previous comment.
Flags: needinfo?(florin.mezei)
Should we add the crash signature from bug 1392375 to this bug?
status-firefox56: fixed → affected
status-firefox57: fixed → affected
Flags: needinfo?(ayang)
(Assignee)

Comment 19

2 months ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> Should we add the crash signature from bug 1392375 to this bug?

Yes
Flags: needinfo?(ayang)
Crash Signature: [@ alloc::oom::oom] → [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>]

Updated

2 months ago
Keywords: regression, topcrash-win
(Assignee)

Comment 20

2 months ago
Most of them happen on win32, I'll disable mp4 rust parser on win32 before Rust supports fallible collection allocation [1].

[1] https://github.com/rust-lang/rfcs/pull/2116
Comment hidden (mozreview-request)
(Assignee)

Updated

2 months ago
Attachment #8897316 - Attachment is obsolete: true

Comment 22

2 months ago
mozreview-review
Comment on attachment 8904148 [details]
Bug 1389470 - disable rust parser on win32 due to OOM.

https://reviewboard.mozilla.org/r/175928/#review181092

Bummer :-(
Attachment #8904148 - Flags: review?(kinetik) → review+
Tracking for 56 to make sure we remember to uplift/land this patch.
tracking-firefox56: --- → +
(Assignee)

Comment 24

2 months ago
(In reply to Matthew Gregan [:kinetik] from comment #22)
> Comment on attachment 8904148 [details]
> Bug 1389470 - disable rust parser on win32 due to OOM.
> 
> https://reviewboard.mozilla.org/r/175928/#review181092
> 
> Bummer :-(

I don't like it either, really.

So I'm thinking other alternatives. I've checked these crash reports, most of OOM happening when vector tries to double its size via push(). It may improve performance and reduce memory pressure if we can use Vec::with_capacity instead of Vec::push [1] in mp4 tables.
The remaining problem is we need to find a way to detect memory pressure before calling Vec::with_capacity.

[1] http://markusjais.com/unterstanding-rusts-vec-and-its-capacity-for-fast-and-efficient-programs/
We only have a couple of beta builds/releases left so there is some time pressure. This should probably be a blocking issue for 56 given the high volume.
tracking-firefox56: + → blocking
(In reply to Alfredo Yang (:alfredo) from comment #24)
> So I'm thinking other alternatives. I've checked these crash reports, most
> of OOM happening when vector tries to double its size via push(). It may
> improve performance and reduce memory pressure if we can use
> Vec::with_capacity instead of Vec::push [1] in mp4 tables.
> The remaining problem is we need to find a way to detect memory pressure
> before calling Vec::with_capacity.

FWIW, the Stylo folks have been working on writing portable extensions to Vec that enable a subset of fallible operations.   You may want to look around the tree and/or talk to them to see if you can reuse their work here.
Priority: -- → P1
(Assignee)

Comment 27

2 months ago
(In reply to Nathan Froyd [:froydnj] from comment #26)
> FWIW, the Stylo folks have been working on writing portable extensions to
> Vec that enable a subset of fallible operations.   You may want to look
> around the tree and/or talk to them to see if you can reuse their work here.

Thanks, I've checked it with :heycam and I'll reuse their works on bug 1389009.
(Assignee)

Comment 28

a month ago
https://github.com/mozilla/mp4parse-rust/pull/114
(Assignee)

Comment 29

a month ago
https://github.com/mozilla/mp4parse-rust/pull/115
Comment hidden (mozreview-request)
(Assignee)

Updated

a month ago
Attachment #8904148 - Attachment is obsolete: true

Comment 31

a month ago
mozreview-review
Comment on attachment 8906976 [details]
Bug 1389470 - use fallible memory allocation to avoid OOM.

https://reviewboard.mozilla.org/r/178734/#review183722
Attachment #8906976 - Flags: review?(kinetik) → review+
(Assignee)

Comment 32

a month ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63d4c8a8d0a50942ba5b444fbfe927cc060c58bc
Heads up that the beta 12 build will start on Wednesday evening. If we miss that, this can still land for the RC build on Monday.

Comment 34

a month ago
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 110483b3a613 -d 8475a8df9f16: rebasing 419427:110483b3a613 "Bug 1389470 - use fallible memory allocation to avoid OOM. r=kinetik" (tip)
merging toolkit/library/gtest/rust/Cargo.lock
merging toolkit/library/rust/Cargo.lock
warning: conflicts while merging toolkit/library/gtest/rust/Cargo.lock! (edit, then use 'hg resolve --mark')
warning: conflicts while merging toolkit/library/rust/Cargo.lock! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
(Assignee)

Comment 35

a month ago
(In reply to Mozilla Autoland from comment #34)
> We're sorry, Autoland could not rebase your commits for you automatically.
> Please manually rebase your commits and try again.
> 
> hg error in cmd: hg rebase -s 110483b3a613 -d 8475a8df9f16: rebasing
> 419427:110483b3a613 "Bug 1389470 - use fallible memory allocation to avoid
> OOM. r=kinetik" (tip)
> merging toolkit/library/gtest/rust/Cargo.lock
> merging toolkit/library/rust/Cargo.lock
> warning: conflicts while merging toolkit/library/gtest/rust/Cargo.lock!
> (edit, then use 'hg resolve --mark')
> warning: conflicts while merging toolkit/library/rust/Cargo.lock! (edit,
> then use 'hg resolve --mark')
> unresolved conflicts (see hg resolve, then hg rebase --continue)

fallible conflicts with the stylo one, I'll change its name since we modified it a lot already.
(Assignee)

Comment 36

a month ago
https://github.com/mozilla/mp4parse-rust/pull/116
Comment hidden (mozreview-request)
(Assignee)

Updated

a month ago
Attachment #8906976 - Attachment is obsolete: true

Comment 38

a month ago
mozreview-review
Comment on attachment 8907435 [details]
Bug 1389470 - use fallible memory allocation to avoid OOM.

https://reviewboard.mozilla.org/r/179122/#review184348
Attachment #8907435 - Flags: review?(kinetik) → review+

Comment 39

a month ago
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a3b2778fa597
use fallible memory allocation to avoid OOM. r=kinetik
Backed out for causing new LSAN leaks.
https://treeherder.mozilla.org/logviewer.html#?job_id=130684652&repo=autoland

https://hg.mozilla.org/integration/autoland/rev/d9a39e5210d07b16cedf7f6b3afc9a850927b631
Flags: needinfo?(ayang)

Updated

a month ago
Crash Signature: [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>] → [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>] [@ OOM | unknown | alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>]
Comment hidden (obsolete)

Comment 42

a month ago
backout
Oops, didn't get the backout included in the merge:

Cherry-picked it to https://hg.mozilla.org/mozilla-central/rev/0e706b53052c86b247bb8037d1a3fbf2e913045e
Status: RESOLVED → REOPENED
status-firefox57: fixed → affected
Resolution: FIXED → ---
Target Milestone: mozilla57 → ---
Can you request uplift? Thanks.
(In reply to Alfredo Yang (:alfredo) from comment #27)
> Thanks, I've checked it with :heycam and I'll reuse their works on bug
> 1389009.

Is this going to work on Beta?
Leak fix: https://github.com/mozilla/mp4parse-rust/pull/117
Comment hidden (mozreview-request)

Comment 47

a month ago
mozreview-review
Comment on attachment 8907908 [details]
Bug 1389470 - fix for zero size memory leak.

https://reviewboard.mozilla.org/r/179584/#review184778
Attachment #8907908 - Flags: review?(kinetik) → review+
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2

Comment 49

a month ago
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d242a53eca7
use fallible memory allocation to avoid OOM. r=kinetik
https://hg.mozilla.org/integration/autoland/rev/4f22ef5d8058
fix for zero size memory leak. r=kinetik
https://hg.mozilla.org/mozilla-central/rev/8d242a53eca7
https://hg.mozilla.org/mozilla-central/rev/4f22ef5d8058
Status: REOPENED → RESOLVED
Last Resolved: a month agoa month ago
status-firefox57: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
This is a release blocking issue so I think P1 is correct here.
Priority: P2 → P1
Alfredo (or kinetik if you see this first) Can you request uplift to m-r?  I'd like to get this patch into the release candidate build on Monday.  

Andrei, can your team verify the fix with the STR in comment 11? Thanks.
Flags: needinfo?(kinetik)
Flags: needinfo?(andrei.vaida)
I assume bug 1389009 will also need an uplift request for this patch to work.
Anthony, can you help out here to get things straightened out for Monday's 56 RC build, since this is a release blocking issue?
Flags: needinfo?(ajones)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #53)
> I assume bug 1389009 will also need an uplift request for this patch to work.

Not by my reading of the patch. It looks like they just copy-pasted the relevant code into mp4parse (which is fine).
Created attachment 8908900 [details] [diff] [review]
bug1389470_uplift.patch

This uplifts 8d242a53eca7 and 4f22ef5d8058 from central.  I didn't update update-rust.sh's embedded revision since we're just uplifting the fallible Vec changes.

Approval Request Comment
[Feature/Bug causing the regression]: inherent in Rust language design, so problem introduced when mp4parse was written in Rust
[User impact if declined]: crash on OOM, fairly common on 32-bit
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: not particularly
[Why is the change risky/not risky?]: affects core Vec allocation, so code executed on every use of MP4 parser
[String changes made/needed]: none
Flags: needinfo?(kinetik)
Attachment #8908900 - Flags: review+
Attachment #8908900 - Flags: approval-mozilla-release?
Comment on attachment 8908900 [details] [diff] [review]
bug1389470_uplift.patch

Avoid an mp4 crash, let's uplift to m-r.
Attachment #8908900 - Flags: approval-mozilla-release? → approval-mozilla-release+

Comment 58

a month ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/264ce7594ce8 (FIREFOX_56b13_RELBRANCH)
https://hg.mozilla.org/releases/mozilla-release/rev/9d567bd62b2c
status-firefox56: affected → fixed
Matthew,
Thanks for your help!
Flags: needinfo?(ayang)
Flags: needinfo?(ajones)
I have managed to reproduce this crash [1], [2], [3] using a 12hr long mp4 video played simultaneously in a few tabs (8 or 10 tabs), alongside with other youtube videos. Tested on Windows 7 x86, Beta 56.0b1:
[1] https://crash-stats.mozilla.com/report/index/faf72fef-3fe6-4fa5-a83b-da25c0170921
[2] https://crash-stats.mozilla.com/report/index/0507ae7a-98d1-4c9c-bc83-f0d450170921
[3] https://crash-stats.mozilla.com/report/index/777f27e1-592e-4ab5-910b-50cfe0170921

I can't reproduce this crash anymore with 56.0 RC (20170918210324) and latest Nightly 57.01 (2017-09-21) on Windows 7 x86. (following the same scenario from above).
Status: RESOLVED → VERIFIED
status-firefox56: fixed → verified
status-firefox57: fixed → verified
Flags: needinfo?(andrei.vaida)
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.