Firefox crash in OOM | small | mp4_demuxer::MP4Sample::Replace

RESOLVED FIXED in Firefox 37

Status

()

P1
critical
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: marcia, Assigned: bobbyholley)

Tracking

(Blocks: 1 bug, {crash, reproducible})

36 Branch
mozilla38
x86
Windows 7
crash, reproducible
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox36 unaffected, firefox37+ fixed, firefox38 fixed)

Details

(crash signature, URL)

Attachments

(6 attachments, 1 obsolete attachment)

This bug was filed from the Socorro interface and is 
report bp-95401fe1-cc89-4c00-8ad3-096ad2150129.
=============================================================

STR:
1. Older Windows machine, with only IRC client running.
2. Download and install a fresh copy of Beta 4.
3. Begin playing the video in the URL
4. Only minutes into the video playing - Crash, and after restart another crash

Second crash after restart shows a different signature in layers: https://crash-stats.mozilla.com/report/index/32e38556-ec92-47d9-9157-14de02150129

Frame 	Module 	Signature 	Source
0 	mozalloc.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp
1 	mozalloc.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp
2 	mozalloc.dll 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp
3 	xul.dll 	mp4_demuxer::MP4Sample::Replace(unsigned char const*, unsigned int) 	media/libstagefright/binding/DecoderData.cpp
4 	xul.dll 	mp4_demuxer::AnnexB::ConvertSampleToAnnexB(mp4_demuxer::MP4Sample*) 	media/libstagefright/binding/AnnexB.cpp
5 	xul.dll 	mozilla::WMFVideoMFTManager::Input(mp4_demuxer::MP4Sample*) 	dom/media/fmp4/wmf/WMFVideoMFTManager.cpp
6 	xul.dll 	mozilla::WMFMediaDataDecoder::ProcessDecode(mp4_demuxer::MP4Sample*) 	dom/media/fmp4/wmf/WMFMediaDataDecoder.cpp
7 	xul.dll 	nsRunnableMethodImpl<void ( mozilla::DataStorage::*)(char const*), char const*, 1>::Run() 	xpcom/glue/nsThreadUtils.h
8 	xul.dll 	mozilla::MediaTaskQueue::Runner::Run() 	dom/media/MediaTaskQueue.cpp
9 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp
10 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
11 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
14 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp
15 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
16 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
17 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
18 	msvcr120.dll 	msvcr120.dll@0x2c000 	
19 	kernel32.dll 	BaseThreadInitThunk 	
20 	ntdll.dll 	__RtlUserThreadStart 	
21 	ntdll.dll 	_RtlUserThreadStart 	

ni on Parul to provide about:support information since it is her machine. We can also grab some memory logging.
Flags: needinfo?(pmathur)
From about:support :-

Application Basics
------------------

Name: Firefox
Version: 36.0
User Agent: Mozilla/5.0 (Windows NT 6.1; rv:36.0) Gecko/20100101 Firefox/36.0
Multiprocess Windows: 0/1

Crash Reports for the Last 3 Days
---------------------------------

Report ID: bp-0972126a-a437-4575-8ad1-141192150129
Submitted: 4 minutes ago

Report ID: bp-32e38556-ec92-47d9-9157-14de02150129
Submitted: 31 minutes ago

Report ID: bp-95401fe1-cc89-4c00-8ad3-096ad2150129
Submitted: 34 minutes ago

All Crash Reports (including 1 pending crash in the given time range)

Extensions
----------

Graphics
--------

Adapter Description: Mobile Intel(R) 4 Series Express Chipset Family
Adapter Drivers: igdumdx32 igd10umd32 igd10umd32
Adapter RAM: Unknown
Device ID: 0x2a42
Direct2D Enabled: Blocked for your graphics driver version. Try updating your graphics driver to version 8.1500.1000.2202 or newer.
DirectWrite Enabled: false (6.2.9200.16571)
Driver Date: 8-27-2009
Driver Version: 8.15.10.1883
GPU #2 Active: false
GPU Accelerated Windows: 1/1 Direct3D 11 (OMTC)
Subsys ID: 145810cf
Vendor ID: 0x8086
WebGL Renderer: Google Inc. -- ANGLE (Mobile Intel(R) 4 Series Express Chipset Family Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote: true
AzureCanvasBackend: skia
AzureContentBackend: cairo
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0

Important Modified Preferences
------------------------------

browser.cache.disk.capacity: 358400
browser.cache.disk.smart_size.first_run: false
browser.cache.frecency_experiment: 1
browser.places.smartBookmarksVersion: 7
browser.startup.homepage_override.buildID: 20150126151838
browser.startup.homepage_override.mstone: 36.0
dom.mozApps.used: true
extensions.lastAppVersion: 36.0
media.gmp-gmpopenh264.lastUpdate: 1422570517
media.gmp-gmpopenh264.version: 1.1
media.gmp-manager.lastCheck: 1422570516
network.cookie.prefsMigrated: true
places.history.expiration.transient_current_max_pages: 78743
plugin.disable_full_page_plugin_for_types: application/pdf
privacy.sanitize.migrateFx3Prefs: true

Important Locked Preferences
----------------------------

JavaScript
----------

Incremental GC: true

Accessibility
-------------

Activated: true
Prevent Accessibility: 0

Library Versions
----------------

NSPR
Expected minimum version: 4.10.7
Version in use: 4.10.7

NSS
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSMIME
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSSL
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSUTIL
Expected minimum version: 3.17.4
Version in use: 3.17.4

Experimental Features
---------------------
Flags: needinfo?(pmathur)
Blocks: 778617
We are retesting this with a new profile to see what happens.
The new profile doesn't seem to be crashing - her original profiler had the Gecko profiler extension installed.
Need info on marcia to retest on this machine again.
Flags: needinfo?(mozillamarcia.knous)

Updated

4 years ago
Priority: -- → P1
Sample buffer allocation clearly needs to be fallible. I can take this.
Assignee: nobody → bobbyholley
Flags: needinfo?(mozillamarcia.knous)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2d872583509

With follow-up to green up b2g: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba245ff0232c
Marica, can we please can an about:memory report as close before the crash as possible?

Bobby's patches should fix this specific crash, but running out of memory a few minutes into a video probably suggests a bigger problem.
Flags: needinfo?(mozillamarcia.knous)
(In reply to Matt Woodrow (:mattwoodrow) from comment #13)
> Marica, can we please can an about:memory report as close before the crash
> as possible?
> 
> Bobby's patches should fix this specific crash, but running out of memory a
> few minutes into a video probably suggests a bigger problem.

I will try to repro this if Parul bring in her machine - it is her personal laptop that we occasionally run tests on.
Flags: needinfo?(mozillamarcia.knous)
Comment on attachment 8563509 [details] [diff] [review]
Part 1 - Get rid of infallible allocation in MP4Sample copy constructor. v1

Review of attachment 8563509 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/fmp4/apple/AppleATDecoder.cpp
@@ +83,5 @@
>    mTaskQueue->Dispatch(
>        NS_NewRunnableMethodWithArg<nsAutoPtr<mp4_demuxer::MP4Sample>>(
>          this,
>          &AppleATDecoder::SubmitSample,
> +        s));

This previously was just taking ownership of the pointer, not cloning wasn't it? Aren't we now leaking aSample now?
Attachment #8563510 - Flags: review?(matt.woodrow) → review+
Attachment #8563511 - Flags: review?(matt.woodrow) → review+
Attachment #8563512 - Flags: review?(matt.woodrow) → review+
Attachment #8563513 - Flags: review?(matt.woodrow) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #15)
> This previously was just taking ownership of the pointer, not cloning wasn't
> it? Aren't we now leaking aSample now?

Great catch, you're totally right (and, checking now, try confirms the leak). Removing the copy constructor caused a build error in that file (from the stuff below), and when I skimmed the file, I mistook the AutoPtr constructor for a copy constructor invocation. Fix coming up.
Attachment #8563509 - Attachment is obsolete: true
Attachment #8563509 - Flags: review?(matt.woodrow)
Attachment #8563783 - Flags: review?(matt.woodrow)
Attachment #8563783 - Flags: review?(matt.woodrow) → review+
Flagging Ralph for uplift.
Flags: needinfo?(giles)
status-firefox36: --- → unaffected
status-firefox37: --- → affected
Depends on: 1129732
Flags: needinfo?(giles)
Bobby, this doesn't apply cleanly to aurora, especially the first and fourth parts have conflicts in GonkDecoderManager. Sorry for not getting to this earlier.

Would you mind doing a quick backport patchset for uplift? I think it's safer to redo the fallibility checks there than to backport the intervening gonk changes.
Flags: needinfo?(bobbyholley)
tracking-firefox37: --- → +
Comment on attachment 8563510 [details] [diff] [review]
Part 2 - Make MP4Sample::Pad fallible. v1

Requesting 37 uplift for all patches on this bug. This will need a backport to the aurora tree since the GonkDecoderModule code is different there. That should be straightforward. Nominating now in expectation that bholley will prepare patches against 37.

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Crashes playing youtube video.
[Describe test coverage new/current, TreeHerder]: Landed on m-c.
[Risks and why]: This touches a fair amount of code but that's mostly error propagation. This has had some time to bake on nightly. Regression risk looks low at this point.
[String/UUID change made/needed]: None.
Attachment #8563510 - Flags: approval-mozilla-aurora?
Comment on attachment 8563510 [details] [diff] [review]
Part 2 - Make MP4Sample::Pad fallible. v1

I'm clearing the approval request for the current set of patches as we know that branch specific patches are required. We do want to take this fix in 37 so I expect that this will be easily approved if the patches are similar to what's already posted. If the patches need to differ substantially for some reason, I'll would prefer to see the fix verified for Aurora 37 specifically before landing if possible.
Attachment #8563510 - Flags: approval-mozilla-aurora?
The backport is very straightforward. It just involves removing the parts of the patches that were propagating failure codes into code that didn't exist on 37. I also needed to switch to the old way of doing fallible allocation |new (fallible_t())| instead of |new (fallible)|.

try push for the backport: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0c5fb0adde9
Posted file backport to 37
This should do the trick.
Flags: needinfo?(bobbyholley)
Attachment #8567433 - Flags: approval-mozilla-aurora?
Comment on attachment 8567433 [details]
backport to 37

I've got to say, I think that this is the first time that I have reviewed patches attached as a zip in Bugzilla. No matter. Thank you for the quick turnaround on the branch specific patch. We absolutely want this stability fix in 37. Aurora+
Attachment #8567433 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=d8fdb34c5565

I accidentally pushed this with try syntax - looks like we don't have that commit hook set up on aurora. :-(
status-firefox37: affected → fixed
Bustage (yeah, turning off non-unified builds on trunk has worked out splendidly):
https://treeherder.mozilla.org/logviewer.html#?job_id=619416&repo=mozilla-aurora
Flags: needinfo?(bobbyholley)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #30)
> Bustage (yeah, turning off non-unified builds on trunk has worked out
> splendidly):

FWIW, I think that's just because bug 1126593 isn't on Aurora.
Backed out.
https://hg.mozilla.org/releases/mozilla-aurora/rev/ec501c1722c0
status-firefox37: fixed → affected
Flags: needinfo?(bobbyholley)
You need to log in before you can comment on or make changes to this bug.