Closed Bug 1282214 Opened 3 years ago Closed 3 years ago

Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mp4parse_new

Categories

(Core :: Audio/Video: Playback, defect, P1, critical)

48 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox47 --- unaffected
firefox48 + fixed
firefox49 + fixed
firefox50 + unaffected

People

(Reporter: philipp, Assigned: rillian)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

[Tracking Requested - why for this release]:
new crash signature first showing up in firefox 48 beta builds. looking related to the new rust parsing of mp4.

This bug was filed from the Socorro interface and is 
report bp-978e1134-f3a5-4a0d-b9af-6ff002160625.
=============================================================
Crashing Thread (98)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	mozglue.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp:46
2 	mozglue.dll 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp:85
3 	xul.dll 	mp4parse_new 	
4 		@0xb46771f 	
5 	mozglue.dll 	mozglue.dll@0x510f 	
6 	xul.dll 	mp4_demuxer::MP4Metadata::MP4Metadata(mp4_demuxer::Stream*) 	media/libstagefright/binding/MP4Metadata.cpp:128
7 	xul.dll 	mozilla::MP4Demuxer::Init() 	dom/media/fmp4/MP4Demuxer.cpp:134
8 	xul.dll 	mozilla::MediaFormatReader::AsyncReadMetadata() 	dom/media/MediaFormatReader.cpp:264
9 	xul.dll 	mozilla::detail::ProxyRunnable<mozilla::MozPromise<RefPtr<mozilla::MetadataHolder>, mozilla::ReadMetadataFailureReason, 1>, mozilla::MediaDecoderReader>::Run() 	obj-firefox/dist/include/mozilla/MozPromise.h:960
10 	xul.dll 	mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() 	obj-firefox/dist/include/mozilla/TaskDispatcher.h:192
11 	xul.dll 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:172
12 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:228
13 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:994
14 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:290
15 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:369
16 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:223
17 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:203
18 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:396
19 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
20 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
21 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
22 	msvcr120.dll 	msvcr120.dll@0x2c000 	
23 	kernel32.dll 	BaseThreadInitThunk 	
24 	ntdll.dll 	__RtlUserThreadStart 	
25 	ntdll.dll 	_RtlUserThreadStart
hi, maybe you could help here.
Flags: needinfo?(giles)
Well, slight improvement in stack traces.

Nathan, what does mp4parse_new (rust code) calling moz_xalloc on Windows 7 mean for our allocator story?

I also don't understand what it means that this is an OOM large; the MediaContext object it allocates is tens of bytes. https://hg.mozilla.org/releases/mozilla-beta/annotate/13b02b96281e/media/libstagefright/binding/mp4parse/lib.rs#l282
Assignee: nobody → giles
Flags: needinfo?(giles) → needinfo?(nfroyd)
I don't think this is a Rust crash; looking at the minidump, I think the stackwalker got confused and we're actually crashing here:

http://dxr.mozilla.org/mozilla-central/source/media/libstagefright/binding/MP4Metadata.cpp#470

Assuming I'm reading the stack right, I think we're trying to resize the buffer to be 1MB and having problems finding a block that large.  That would also make more sense for an OOM large being reported...and indeed, the allocation size in the crash report is exactly 1MB.

I guess the right thing to do here is convert that code to use nsTArray and a fallible allocation?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #3)

> Assuming I'm reading the stack right, I think we're trying to resize the
> buffer to be 1MB and having problems finding a block that large.  That would
> also make more sense for an OOM large being reported...and indeed, the
> allocation size in the crash report is exactly 1MB.

Ah, that makes much more sense.

> I guess the right thing to do here is convert that code to use nsTArray and
> a fallible allocation?

Probably. We could also reduce the maximum allocation, but I'd be nervous doing that on beta. In bug 1267887 we replace the copy with a read callback, which should greatly reduce the allocation.

We can also just disable MOZ_RUST_MP4PARSE on win32 on beta. Philipp, how serious is this? It's not in the top 100 crashes, but ~100 crashes a week is higher than I'd like. If we're failing to allocate 1 MB likely the video wouldn't have played anyway, but of course getting a 'video could not be decoded' message is better than an OOM.
Proposed patch to just disable the feature on beta. There should be no impact on users other than fewer crashes, but we'll lose the broader testing for when the allocation succeeds.

I'll also prepare a fallible allocator patch, but wanted to have this ready to go as a fallback; if we're turning this off, sooner is better than later.
Attachment #8766113 - Flags: review?(mshal)
Attachment #8766113 - Flags: review?(ajones)
Attachment #8766113 - Flags: review?(ajones) → review+
(In reply to Ralph Giles (:rillian) needinfo me from comment #4)
> Philipp, how serious is this? It's not in the top 100 crashes,
> but ~100 crashes a week is higher than I'd like.

i agree that it isn't a high impact issue - the signature is just 0.07% of 48.0b crashes.
Patch to return an error instead of crashing on the allocation failure.

Nathan, is this sufficient to check the resize(), or do I need to declare nsFallibleTArray?
Attachment #8766514 - Flags: review?(kinetik)
Attachment #8766514 - Flags: feedback?(nfroyd)
Comment on attachment 8766514 [details] [diff] [review]
Make read buffer fallible for aurora 49

Review of attachment 8766514 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me; we're trying to get rid of FallibleTArray by having fallible versions of SetLength, so one can just use nsTArray everywhere and fallible operations are clearly noted in the code.
Attachment #8766514 - Flags: feedback?(nfroyd) → feedback+
Attachment #8766514 - Flags: review?(kinetik) → review+
Comment on attachment 8766514 [details] [diff] [review]
Make read buffer fallible for aurora 49

Approval Request Comment

[Feature/regressing bug #]: Bug 1161350 (rust mp4 parser)

[User impact if declined]: Crashes watching videos in low memory conditions.

[Describe test coverage new/current, TreeHerder]: local testing only; the issue was addressed differently in nightly.

[Risks and why]: The main risk is that the change would trigger some other issue with a higher frequency than this bug by pushing the OOM somewhere else. I think the risk is worthwhile if we can reduce crashes, and get some indication of safety should we decide we want this on 48 beta as well.

[String/UUID change made/needed]: None.
Attachment #8766514 - Flags: approval-mozilla-aurora?
Attachment #8766113 - Flags: review?(mshal) → review+
Comment on attachment 8766514 [details] [diff] [review]
Make read buffer fallible for aurora 49

May reduce OOM crashes in aurora. OK to land directly in aurora.
Attachment #8766514 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
rillian, beta 6 (of 10) will go to build next thursday morning, so maybe you and philipp can help assess the results of this in aurora before then.  I'm willing to try it but the earlier in beta we land, the better, so we have time to judge it there and back out if needed.
Flags: needinfo?(giles)
i only see this signature in 48 beta builds so far (it's not on aurora at all), so we probably can't draw any conclusions from the patch landing on aurora.
Track this as this issue happened in 48 beta for a while.
Comment on attachment 8766514 [details] [diff] [review]
Make read buffer fallible for aurora 49

phillipp is correct that we don't have any reports from 49. In light of that I'd like to try this on 48 beta as well.

Approval Request Comment
[Feature/regressing bug #]: Bug 1161350 (rust mp4 parser)

[User impact if declined]: Crashes playing video in low memory conditions (~100/week on beta)

[Describe test coverage new/current, TreeHerder]: Landed on m-a.

[Risks and why]: Since we're hitting an OOM, it's likely we won't be able to play the video anyway; this patch just converts the crash to a more-friendly playback failure. The risk would be that by postponing the OOM we cause more crashes elsewhere. If we do judge it worse, backout or disabling the feature entirely are straightforward.

[String/UUID change made/needed]: None
Flags: needinfo?(giles)
Attachment #8766514 - Flags: approval-mozilla-beta?
Comment on attachment 8766514 [details] [diff] [review]
Make read buffer fallible for aurora 49

Review of attachment 8766514 [details] [diff] [review]:
-----------------------------------------------------------------

This patch fixes a crash. Take it in 48 beta 6. And, we need to keep an eye on this fix and see if any side effect happens.
Attachment #8766514 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I don't see any crash reports for 48 beta 5, so either we've driven off everyone who was hitting this issue, or something else shadowed it before we could verify the fix. Will keep watching as beta 6 comes out.
No reports since beta4. Treating as fixed.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.