Closed Bug 976141 Opened 7 years ago Closed 7 years ago

crash in mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | GCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString)

Categories

(Core :: General, defect)

28 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox28 + wontfix
firefox29 + wontfix
firefox30 + wontfix

People

(Reporter: u279076, Unassigned)

Details

(Keywords: crash, steps-wanted, topcrash-win)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-a54c6df1-a6d3-4061-a5f8-a906f2140224.
=============================================================
0 	mozalloc.dll 	mozalloc_abort(char const * const) 	memory/mozalloc/mozalloc_abort.cpp
1 	mozalloc.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp
2 	mozalloc.dll 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp
3 	xul.dll 	GCGraphBuilder::NoteChild(void *,nsCycleCollectionParticipant *,nsCString) 	xpcom/base/nsCycleCollector.cpp
4 	xul.dll 	NoteJSChildTracerShim 	xpcom/base/CycleCollectedJSRuntime.cpp
5 	mozjs.dll 	js::ObjectImpl::markChildren(JSTracer *) 	js/src/vm/ObjectImpl.cpp
6 	mozjs.dll 	JS_TraceChildren(JSTracer *,void *,JSGCTraceKind) 	js/src/gc/Tracer.cpp
7 	xul.dll 	GCGraphBuilder::Traverse(PtrInfo *) 	xpcom/base/nsCycleCollector.cpp
8 	xul.dll 	nsCycleCollector::MarkRoots(js::SliceBudget &) 	xpcom/base/nsCycleCollector.cpp
9 	xul.dll 	nsCycleCollector::Collect(ccType,js::SliceBudget &,nsICycleCollectorListener *) 	xpcom/base/nsCycleCollector.cpp
10 	xul.dll 	nsJSContext::ScheduledCycleCollectNow() 	dom/base/nsJSEnvironment.cpp
11 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/nsTimerImpl.cpp
12 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
13 	ntdll.dll 	EtwEventEnabled 	
14 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	xpcom/glue/nsThreadUtils.cpp
15 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
16 	xul.dll 	_SEH_epilog4 	
17 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
18 	xul.dll 	nsBaseAppShell::Run() 	widget/xpwidgets/nsBaseAppShell.cpp
19 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
20 	nss3.dll 	nss3.dll@0x7970 	
21 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
22 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
23 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
24 	firefox.exe 	NS_internal_main(int,char * *) 	browser/app/nsBrowserApp.cpp
25 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
26 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt_bld/self_x86/crt/src/crtexe.c
27 	kernel32.dll 	BaseThreadInitThunk 	
28 	ntdll.dll 	__RtlUserThreadStart 	
29 	ntdll.dll 	_RtlUserThreadStart 	

More Reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const%2A+const%29+%7C+mozalloc_handle_oom%28unsigned+int%29+%7C+moz_xmalloc+%7C+GCGraphBuilder%3A%3ANoteChild%28void%2A%2C+nsCycleCollectionParticipant%2A%2C+nsCString%29
=============================================================

Shows up fairly high in the latest Firefox 28 explosiveness reports with a 3-day rating of 2.6. It currently sits at #15 overall on Beta according to this report:
https://crash-analysis.mozilla.com/rkaiser/2014-02-23/2014-02-23.firefox.28.explosiveness.html

Top URLs:
179 	https://www.facebook.com/
56 	about:blank
24 	https://www.facebook.com/?ref=tn_tnmn
13 	about:newtab
10 	https://twitter.com/

Reports of this signature go back to Firefox 15.0b2 in the 7-day report so I don't think this is a Firefox 28 regression. However, something seems to have made this somewhat worse in the last week or so.
This and the signature I added are also rising on Nightly.  Some reporter comments point to using flash or silverlight at netflix.
Crash Signature: [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | GCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString)] → [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | GCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | …
Adding qawanted to see if we can find steps to reproduce using Netflix. Tracy do you have a Netflix account that you regularly use that you can test this with?
I don't have a netflix account.  I wish correlations were working properly.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #3)
> I don't have a netflix account.  I wish correlations were working properly.

I could dig out some manually from the raw files. Here's some correlations for yesterday on Firefox 28.0:

Modules:

  mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | GCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString)|EXCEPTION_BREAKPOINT (330 crashes)
     92% (302/330) vs.  37% (14238/38203) cscapi.dll
     93% (307/330) vs.  39% (14851/38203) linkinfo.dll
     94% (309/330) vs.  41% (15526/38203) ntshrui.dll
     85% (279/330) vs.  33% (12419/38203) slc.dll
     84% (277/330) vs.  36% (13910/38203) mf.dll
     84% (276/330) vs.  36% (13901/38203) mfreadwrite.dll
     84% (277/330) vs.  37% (14073/38203) mfplat.dll
     93% (307/330) vs.  46% (17649/38203) duser.dll
     92% (304/330) vs.  46% (17514/38203) dui70.dll
     92% (304/330) vs.  46% (17515/38203) explorerframe.dll
     94% (309/330) vs.  48% (18380/38203) pnrpnsp.dll
     94% (309/330) vs.  48% (18391/38203) NapiNSP.dll
     92% (302/330) vs.  47% (17812/38203) FWPUCLNT.DLL
     84% (276/330) vs.  39% (14842/38203) dxva2.dll
     93% (306/330) vs.  48% (18388/38203) nlaapi.dll

Add-ons:

  mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | GCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString)|EXCEPTION_BREAKPOINT (330 crashes)
     46% (152/330) vs.   7% (2565/38203) {82AF8DCA-6DE9-405D-BD5E-43525BDAD38A}


There's no 100% correlation in there, but at least some pointers that could be helpful.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #4)
>      46% (152/330) vs.   7% (2565/38203) {82AF8DCA-6DE9-405D-BD5E-43525BDAD38A}

This is Skype Click-to-Call.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #3)
> I don't have a netflix account.  I wish correlations were working properly.

Tracy, I think we have test credentials on the Intranet. Would you be willing to test this if I can track those credentials down? Unfortunately I can't ask Softvision to help here since Netflix won't work in their locale.
I just spoke with Rebecca Billings on IRC and she's going to try to find steps for this.
Flags: needinfo?(rbillings)
I tested running Netflix on an old version of Aurora. Then I upgraded to a new version. I enabled silverlight. I made sure flash was up to date and running. I opened Facebook in another tab. 

I was unable to test Skype Click to Call. I downloaded it but it only installed on Safari- and it is not listed on the Extensions list, most likely because I'm running Mavericks & it's not compatible. 

I was unable to repro any problem.
Flags: needinfo?(rbillings)
Thanks Rebecca, I neglected to mention this crash is only showing up on Windows right now with the majority occurring on Windows 7 32-bit. Do you have access to a computer running this platform?
I tested on Win7 SP1 with Aurora running. Installing flash and running Facebook changed nothing. While installing Skype click to call I got one error [Netflix error code N8157-6037], but was unable to repro the problem after getting the Skype addon installed on the browser. I was also unable to find a website with a number to call using the addon so if you have suggestions for that I'll give it a try.
Thanks for the help, Rebecca.

This is now up 9 positions to #10 in Firefox 28. Can we get some assistance from Engineering as to what we might test based on the stack?
Flags: needinfo?(release-mgmt)
Keywords: qawantedsteps-wanted
The build that crashed the most was FF 28.0b4 and the OS with the most crashes was Win7. 
From the comments it seems that Facebook and other sites that are using flash plugin(youtube, twitch) are the pages with most crashes.
With this information in mind I tried for 2 hours opening youtube, twitch and facebook pages on several tabs/windows going to a high memory consumption several session restores but I didn't manage to reproduce the issue.

I even tried installing Skype Click to Call addon but for some reason the addon was not properly working for me. The add-on appeared on about:addons page as enabled but it was doing nothing.

I used:

FF 28.0b4
Build Id: 20140218122424
OS:Win 7
Product: Firefox → Core
This has now risen 6 positions to #6 in the last week, accounting for 1.17% of our Firefox 28 crashes.
Bsmedberg/Andrew/Khuey -- can any of you see something here that could be tried/tested to get more visibility into this?  It's almost FF28 release time and the dropping of names like Skype and Netflix is concerning that we're looking at a potential explosive issue with a larger user pop.
Flags: needinfo?(release-mgmt)
Flags: needinfo?(khuey)
Flags: needinfo?(continuation)
Flags: needinfo?(benjamin)
Flags: needinfo?(khuey)
This is just OOM, right?  Which is why it shows up most on 32-bit Windows, where we're constrained by virtual address space and not physical memory.  Did we just move other signatures to this one in 28?
Yeah, my guess would be that some other allocation point was converted to being fallible, and so things just ended up in this bucket now.  This is probably dying when it tries to allocate a new EdgePool block, which is around 65KB.  We could probably cut the size of the blocks by, I don't know, 4 or 8 or something, which might help, or might just kick the can somewhere else.
Flags: needinfo?(continuation)
(cut the size by a factor of 4 or 8, I meant)
Benjamin - what's your take here?  How worrisome does this seem for GA release?  Is the cutting of block size something safe to do at this stage? (we've already gone to build on beta 9, fyi)
The allocations here are all 65536 or 163864. Cutting the block size isn't going to help in any meaningful way for allocations less than 1MB.
Flags: needinfo?(benjamin)
This is still the #5 top-crash for Firefox 28 accounting for 1.44% of all crashes. Unfortunately I don't think we'll get this resolved for Firefox 28 given that it releases tomorrow morning.
Anthony, does it look as bad in 29? Thanks!
Flags: needinfo?(anthony.s.hughes)
(In reply to Sylvestre Ledru [:sylvestre] from comment #21)
> Anthony, does it look as bad in 29? Thanks!

Crashes per Install:
====================
Firefox 28.0 	4558:4475 => 1.019
Firefox 29.0b* 	2880:2642 => 1.090

It's hard to say based on this data but I'd guess this is just as bad on 29 as it was on 28. FWIW, this sits at #10 (1.23%) on Firefox 29 currently.
Flags: needinfo?(anthony.s.hughes)
Thanks Anthony!

Benjamin, do you see anything we could o here to mitigate this issue? Thanks
Flags: needinfo?(benjamin)
This is OOM. The thing we need to do is figure out why we're running out of memory and fix that. We don't know how to do that, and this bug isn't going to help.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(benjamin)
Resolution: --- → WONTFIX
Flagging as wontfix as per comment 24.
I joined so I can report something with regards to this bug. I notice it is now #8 on the explosiveness report for Release 29 and also the submitted comments and occurrence is increasing. I had noticed this bug myself when 28 came out and tried multiple fixes (which managed to reduce it happening but still happened 2-3 times a day).

I then started using Aurora 31 and found this bug does not show up. My question is; was something done to fix this in 31, and if not, then something was done that "accidentally" fixed or improved it.

I have not found anything mentioned or discussed regarding this so felt I needed to let someone involved know.
(In reply to tntschiff from comment #26)
> I then started using Aurora 31 and found this bug does not show up. 

I'm glad this isn't reproducing for you as much anymore. I checked our crash-stats and I see a few reports with Aurora 31 builds in the last few days so this certainly is still an issue for some people. 

However, as Benjamin mentioned in comment 24, this crash is triggered by out-of-memory. I know we are constantly landing fixes to improve memory and performance. It stands to reason that those improvements may reduce the likelihood of encountering an out-of-memory trigger.

I suspect whatever OOM trigger you were encountering has been mitigated (perhaps unintentionally) through the ongoing work to improve memory performance.
I have been monitoring this bug but as of the 16th of June it has gone completely missing. No updates, no comments, no reports, nothing on the explosive reports, nada. Is there are reason or has it been resolved? Thanks
This crash happens when we run out of memory, so there's nothing obvious that can be done with just a crash stack.  There is ongoing work on improving the memory usage of Firefox, particularly on low-memory situations on 32-bit Windows.
You need to log in before you can comment on or make changes to this bug.