Closed Bug 1389470 Opened 7 years ago Closed 7 years ago

Crash in alloc::oom::oom

Categories

(Core :: Audio/Video: Playback, defect, P1)

56 Branch
x86
Windows 7
defect

Tracking

()

VERIFIED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 blocking verified
firefox57 --- verified

People

(Reporter: calixte, Assigned: ayang)

References

Details

(Keywords: crash, regression, topcrash-win)

Crash Data

Attachments

(3 files, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-abe8ff63-50b2-463e-ba75-317740170811.
=============================================================

There 40 crashes in beta 56.0b1 from 37 installations.
:alfredo, could you investigate please ?
Flags: needinfo?(ayang)
mp4parse_indice is a vector which keeps the sample data of a mp4 stream. In mp4 demuxer, the max length is about 7 days long video [1].
So the maximum memory consumption of mp4parse_indice table is about 800MB. The crash reports seem all happened on a 32 bit windows machines with 2G memory space. And the problem occurrs when the vector tries to double itself to extend size.

Hi, kinetik,
I can reduce the table size if it is on a 32 bit machine by something like 'target_pointer_width' [2].
Any suggestion?


[1] https://dxr.mozilla.org/mozilla-central/rev/5322c03f4c8587fe526172d3f87160031faa6d75/media/libstagefright/binding/mp4parse/src/lib.rs#33
[2] https://doc.rust-lang.org/reference/attributes.html#conditional-compilation
Flags: needinfo?(ayang) → needinfo?(kinetik)
Assignee: nobody → ayang
(In reply to Alfredo Yang (:alfredo) from comment #1)
> I can reduce the table size if it is on a 32 bit machine by something like
> 'target_pointer_width' [2].
> Any suggestion?

I think long term we want fallible allocation so we can handle OOM gracefully.  There's some discussion about this among the Rust language people (https://internals.rust-lang.org/t/notes-interviews-on-fallible-collection-allocations/5619 is the latest, I think), but I don't know how far away a solution is.  It's possible we could find a fallible Vec crate and switch to that (or implement our own), if this continues to be a problem with the adjusted limit.

For a short term fix, adjusting the limit based on the architecture seems reasonable.
Flags: needinfo?(kinetik)
Another option is to avoid these potentially large allocations entirely by changing the interface to a (streaming) iterator based one, rather than returning the complete table.  This would require changes on the Gecko side, and I haven't investigated how feasible it is, but it may be worth considering.
(In reply to Matthew Gregan [:kinetik] from comment #4)
> Another option is to avoid these potentially large allocations entirely by
> changing the interface to a (streaming) iterator based one, rather than
> returning the complete table.  This would require changes on the Gecko side,
> and I haven't investigated how feasible it is, but it may be worth
> considering.

Yeah, I'll investigate it later. Thank you.
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

https://reviewboard.mozilla.org/r/168620/#review173866
Attachment #8897316 - Flags: review?(kinetik) → review+
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/086e30b2dfab
reduce table size on 32bit arch to avoid OOM. r=kinetik
https://hg.mozilla.org/mozilla-central/rev/086e30b2dfab
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please request Beta approval on this when you're comfortable doing so.
Flags: needinfo?(ayang)
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

Approval Request Comment
[Feature/Bug causing the regression]: mp4 sample table too large causes OOM
[User impact if declined]: OOM if user plays a very long mp4 video
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: please play a 24hr long mp4 video, it should be rejected at 32bit win7 without OOM.
[List of other uplifts needed for the feature/fix]: no
[Is the change risky?]: no, it only affects on 32bit win7.
[Why is the change risky/not risky?]: it reduces the table size only, from 7 days long mp4 video to 1 day.
[String changes made/needed]: none
Flags: needinfo?(ayang)
Attachment #8897316 - Flags: approval-mozilla-beta?
Hi Florin, could you help find someone to verify if this issue was fixed as expected on the latest Nightly build? Thanks!
Flags: qe-verify+
Flags: needinfo?(florin.mezei)
Comment on attachment 8897316 [details]
Bug 1389470 - reduce table size on 32bit arch to avoid OOM.

Avoiding an OOM sounds good, let's take this for beta 5. Verification would still be nice but I don't think it's necessary before uplift.
Attachment #8897316 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Unfortunately, it doesn't fix it.

https://bugzilla.mozilla.org/show_bug.cgi?id=1392375#c4
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Gerry Chang [:gchang] from comment #12)
> Hi Florin, could you help find someone to verify if this issue was fixed as
> expected on the latest Nightly build? Thanks!

Removing needinfo since it still reproduces based on the previous comment.
Flags: needinfo?(florin.mezei)
Should we add the crash signature from bug 1392375 to this bug?
Flags: needinfo?(ayang)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> Should we add the crash signature from bug 1392375 to this bug?

Yes
Flags: needinfo?(ayang)
Crash Signature: [@ alloc::oom::oom] → [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>]
Most of them happen on win32, I'll disable mp4 rust parser on win32 before Rust supports fallible collection allocation [1].

[1] https://github.com/rust-lang/rfcs/pull/2116
Attachment #8897316 - Attachment is obsolete: true
Comment on attachment 8904148 [details]
Bug 1389470 - disable rust parser on win32 due to OOM.

https://reviewboard.mozilla.org/r/175928/#review181092

Bummer :-(
Attachment #8904148 - Flags: review?(kinetik) → review+
Tracking for 56 to make sure we remember to uplift/land this patch.
(In reply to Matthew Gregan [:kinetik] from comment #22)
> Comment on attachment 8904148 [details]
> Bug 1389470 - disable rust parser on win32 due to OOM.
> 
> https://reviewboard.mozilla.org/r/175928/#review181092
> 
> Bummer :-(

I don't like it either, really.

So I'm thinking other alternatives. I've checked these crash reports, most of OOM happening when vector tries to double its size via push(). It may improve performance and reduce memory pressure if we can use Vec::with_capacity instead of Vec::push [1] in mp4 tables.
The remaining problem is we need to find a way to detect memory pressure before calling Vec::with_capacity.

[1] http://markusjais.com/unterstanding-rusts-vec-and-its-capacity-for-fast-and-efficient-programs/
We only have a couple of beta builds/releases left so there is some time pressure. This should probably be a blocking issue for 56 given the high volume.
(In reply to Alfredo Yang (:alfredo) from comment #24)
> So I'm thinking other alternatives. I've checked these crash reports, most
> of OOM happening when vector tries to double its size via push(). It may
> improve performance and reduce memory pressure if we can use
> Vec::with_capacity instead of Vec::push [1] in mp4 tables.
> The remaining problem is we need to find a way to detect memory pressure
> before calling Vec::with_capacity.

FWIW, the Stylo folks have been working on writing portable extensions to Vec that enable a subset of fallible operations.   You may want to look around the tree and/or talk to them to see if you can reuse their work here.
Priority: -- → P1
(In reply to Nathan Froyd [:froydnj] from comment #26)
> FWIW, the Stylo folks have been working on writing portable extensions to
> Vec that enable a subset of fallible operations.   You may want to look
> around the tree and/or talk to them to see if you can reuse their work here.

Thanks, I've checked it with :heycam and I'll reuse their works on bug 1389009.
Attachment #8904148 - Attachment is obsolete: true
Comment on attachment 8906976 [details]
Bug 1389470 - use fallible memory allocation to avoid OOM.

https://reviewboard.mozilla.org/r/178734/#review183722
Attachment #8906976 - Flags: review?(kinetik) → review+
Heads up that the beta 12 build will start on Wednesday evening. If we miss that, this can still land for the RC build on Monday.
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 110483b3a613 -d 8475a8df9f16: rebasing 419427:110483b3a613 "Bug 1389470 - use fallible memory allocation to avoid OOM. r=kinetik" (tip)
merging toolkit/library/gtest/rust/Cargo.lock
merging toolkit/library/rust/Cargo.lock
warning: conflicts while merging toolkit/library/gtest/rust/Cargo.lock! (edit, then use 'hg resolve --mark')
warning: conflicts while merging toolkit/library/rust/Cargo.lock! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
(In reply to Mozilla Autoland from comment #34)
> We're sorry, Autoland could not rebase your commits for you automatically.
> Please manually rebase your commits and try again.
> 
> hg error in cmd: hg rebase -s 110483b3a613 -d 8475a8df9f16: rebasing
> 419427:110483b3a613 "Bug 1389470 - use fallible memory allocation to avoid
> OOM. r=kinetik" (tip)
> merging toolkit/library/gtest/rust/Cargo.lock
> merging toolkit/library/rust/Cargo.lock
> warning: conflicts while merging toolkit/library/gtest/rust/Cargo.lock!
> (edit, then use 'hg resolve --mark')
> warning: conflicts while merging toolkit/library/rust/Cargo.lock! (edit,
> then use 'hg resolve --mark')
> unresolved conflicts (see hg resolve, then hg rebase --continue)

fallible conflicts with the stylo one, I'll change its name since we modified it a lot already.
Attachment #8906976 - Attachment is obsolete: true
Comment on attachment 8907435 [details]
Bug 1389470 - use fallible memory allocation to avoid OOM.

https://reviewboard.mozilla.org/r/179122/#review184348
Attachment #8907435 - Flags: review?(kinetik) → review+
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a3b2778fa597
use fallible memory allocation to avoid OOM. r=kinetik
Crash Signature: [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>] → [@ alloc::oom::oom] [@ alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>] [@ OOM | unknown | alloc::oom::oom | alloc::raw_vec::RawVec<T>::double<T>]
Oops, didn't get the backout included in the merge:

Cherry-picked it to https://hg.mozilla.org/mozilla-central/rev/0e706b53052c86b247bb8037d1a3fbf2e913045e
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla57 → ---
Can you request uplift? Thanks.
(In reply to Alfredo Yang (:alfredo) from comment #27)
> Thanks, I've checked it with :heycam and I'll reuse their works on bug
> 1389009.

Is this going to work on Beta?
Comment on attachment 8907908 [details]
Bug 1389470 - fix for zero size memory leak.

https://reviewboard.mozilla.org/r/179584/#review184778
Attachment #8907908 - Flags: review?(kinetik) → review+
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
Pushed by ayang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d242a53eca7
use fallible memory allocation to avoid OOM. r=kinetik
https://hg.mozilla.org/integration/autoland/rev/4f22ef5d8058
fix for zero size memory leak. r=kinetik
https://hg.mozilla.org/mozilla-central/rev/8d242a53eca7
https://hg.mozilla.org/mozilla-central/rev/4f22ef5d8058
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
This is a release blocking issue so I think P1 is correct here.
Priority: P2 → P1
Alfredo (or kinetik if you see this first) Can you request uplift to m-r?  I'd like to get this patch into the release candidate build on Monday.  

Andrei, can your team verify the fix with the STR in comment 11? Thanks.
Flags: needinfo?(kinetik)
Flags: needinfo?(andrei.vaida)
I assume bug 1389009 will also need an uplift request for this patch to work.
Anthony, can you help out here to get things straightened out for Monday's 56 RC build, since this is a release blocking issue?
Flags: needinfo?(ajones)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #53)
> I assume bug 1389009 will also need an uplift request for this patch to work.

Not by my reading of the patch. It looks like they just copy-pasted the relevant code into mp4parse (which is fine).
This uplifts 8d242a53eca7 and 4f22ef5d8058 from central.  I didn't update update-rust.sh's embedded revision since we're just uplifting the fallible Vec changes.

Approval Request Comment
[Feature/Bug causing the regression]: inherent in Rust language design, so problem introduced when mp4parse was written in Rust
[User impact if declined]: crash on OOM, fairly common on 32-bit
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: not particularly
[Why is the change risky/not risky?]: affects core Vec allocation, so code executed on every use of MP4 parser
[String changes made/needed]: none
Flags: needinfo?(kinetik)
Attachment #8908900 - Flags: review+
Attachment #8908900 - Flags: approval-mozilla-release?
Comment on attachment 8908900 [details] [diff] [review]
bug1389470_uplift.patch

Avoid an mp4 crash, let's uplift to m-r.
Attachment #8908900 - Flags: approval-mozilla-release? → approval-mozilla-release+
Matthew,
Thanks for your help!
Flags: needinfo?(ayang)
Flags: needinfo?(ajones)
I have managed to reproduce this crash [1], [2], [3] using a 12hr long mp4 video played simultaneously in a few tabs (8 or 10 tabs), alongside with other youtube videos. Tested on Windows 7 x86, Beta 56.0b1:
[1] https://crash-stats.mozilla.com/report/index/faf72fef-3fe6-4fa5-a83b-da25c0170921
[2] https://crash-stats.mozilla.com/report/index/0507ae7a-98d1-4c9c-bc83-f0d450170921
[3] https://crash-stats.mozilla.com/report/index/777f27e1-592e-4ab5-910b-50cfe0170921

I can't reproduce this crash anymore with 56.0 RC (20170918210324) and latest Nightly 57.01 (2017-09-21) on Windows 7 x86. (following the same scenario from above).
Status: RESOLVED → VERIFIED
Flags: needinfo?(andrei.vaida)
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: