crash in NS_ABORT_OOM(unsigned int) | PL_DHashTableInit(PLDHashTable*, PLDHashTableOps const*, void*, unsigned int, unsigned int) | nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*)

RESOLVED DUPLICATE of bug 1006181

Status

()

defect
--
critical
RESOLVED DUPLICATE of bug 1006181
5 years ago
5 years ago

People

(Reporter: jbecerra, Unassigned)

Tracking

({crash})

32 Branch
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

This bug was filed from the Socorro interface and is 
report bp-9dafed22-c6c4-4728-8566-52e322140430.
=============================================================

Currently a top crasher on nightly. All report comments mention watching videos, and a couple of them mention that the moment where the browser crashed coincided with the "Are you still watching?" overlay showing up while watching a video.

Mostly happening on Windows 7 and 8.

0 	xul.dll 	NS_ABORT_OOM(unsigned int) 	xpcom/base/nsDebugImpl.cpp
1 	xul.dll 	PL_DHashTableInit(PLDHashTable *,PLDHashTableOps const *,void *,unsigned int,unsigned int) 	xpcom/glue/pldhash.cpp
2 	xul.dll 	nsCycleCollector::BeginCollection(ccType,nsICycleCollectorListener *) 	xpcom/base/nsCycleCollector.cpp
3 	xul.dll 	nsCycleCollector::Collect(ccType,js::SliceBudget &,nsICycleCollectorListener *) 	xpcom/base/nsCycleCollector.cpp
4 	xul.dll 	nsCycleCollector_collectSlice(__int64) 	xpcom/base/nsCycleCollector.cpp
5 	xul.dll 	nsJSContext::RunCycleCollectorSlice() 	dom/base/nsJSEnvironment.cpp
6 	xul.dll 	CCTimerFired 	dom/base/nsJSEnvironment.cpp
7 	xul.dll 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
8 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/nsTimerImpl.cpp
9 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
10 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	xpcom/glue/nsThreadUtils.cpp
11 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
14 	xul.dll 	nsBaseAppShell::Run() 	widget/xpwidgets/nsBaseAppShell.cpp
15 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
16 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
17 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
18 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
19 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
20 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
21 	firefox.exe 	NS_internal_main(int,char * *) 	browser/app/nsBrowserApp.cpp
22 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
23 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt_bld/self_x86/crt/src/crtexe.c
24 	kernel32.dll 	BaseThreadInitThunk 	
25 	ntdll.dll 	__RtlUserThreadStart 	
26 	ntdll.dll 	_RtlUserThreadStart
Version: 29 Branch → 32 Branch
OOMAllocationSize: 262144

We're not going to fix the OOM itself, we need to understand what's going on under the hood. Juan can you work with your team to try and get STR, and/or email people who are seeing this to ask for details?

Not sure whether the netflix "still watching" popup is a modal dialog or what.
If this just popped up in the last few days, it is likely a regression from ICC somehow.  We could be failing to run the CC aggressively enough in some conditions so we OOM more.

> Not sure whether the netflix "still watching" popup is a modal dialog or what.

It just looks like something generated within the Silverlight plugin itself.  See the second image here:
http://www.groovypost.com/news/disable-netflix-post-play-tv-season-binge/
Blocks: 911246
This sounds like a dupe of bug 976217, down to the Netflix connection.  This is one of the largest contiguous allocations we make in the browser, so maybe it isn't actually related to ICC at all.
This very likely points to cache v2 being reenabled on nightly, bug 1004185. Although changes to highwater memory usage from either place can cause something like this to show up.

We could really use an about:memory dump before the crash.

I don't understand why we turned cache v2 on if there were known-unfixed issues from the first deployment.
Blocks: 1004185
Flags: needinfo?(honzab.moz)
No longer blocks: 911246
So, if it is dupe of 976217, why it crashes in nsCycleCollector instead of in cache2 code?
If we're low on memory, we can end up crashing in various places.  The cycle collector just happens to allocate a large block of memory.
OK, so it means that using fallible allocator won't fix anything, right?
The cycle collector is an essential part of freeing memory, so if we can't run it because we're low on memory, we're doomed anyways.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> This very likely points to cache v2 being reenabled on nightly, bug 1004185.
> Although changes to highwater memory usage from either place can cause
> something like this to show up.
> 
> We could really use an about:memory dump before the crash.
> 
> I don't understand why we turned cache v2 on if there were known-unfixed
> issues from the first deployment.

This was not known.  There was no (top) crash like this before!

Known is that we use infallible allocators.  The bug 976217 (changing to fallible) has been moved (by me) to a later stage, but this one bug shows something probably needs to happen now.

If this is a serious top crashe, we may want to turn cache2 off again now.
Flags: needinfo?(honzab.moz)
There were reports of memory usage on the nightly threads. This post found 10GB (x64) of cache2 in about:memory -- http://forums.mozillazine.org/viewtopic.php?p=13519625#p13519625
Depends on: 1006181
Thanks for this link!  It's memory-only cache.  So it seems like a bug in purging out the memory cache data.  I'll look at it - bug 1006181.
I haven't been able to reproduce the crash, but by just loading Netflix video I see a browser hang reliably, for example, if I tried fast fast forwarding or jumping forward in the timeline. I tried this on a VM, and often it would become unresponsive.

Bug 1006263 seems related and Robert posted some memory snapshots there as well.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1006181
You need to log in before you can comment on or make changes to this bug.