Closed Bug 991746 Opened 10 years ago Closed 8 years ago

crash in js::Nursery::MinorGCCallback(JSTracer*, void**, JSGCTraceKind)


(Core :: JavaScript: GC, defect)

31 Branch
Windows NT
Not set



Tracking Status
firefox31 - affected


(Reporter: lizzard, Unassigned)



(Keywords: crash, Whiteboard: [GGC])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-f9f5ed54-a835-4415-9ac9-def992140402.

This is the #6 topcrash for Firefox 31.0a1 with 232 out of 5697 crashes.
It first appeared in the 2014032903 build.
Terrence, can you please look into this? It is currently #6 @ 2.01% in Firefox Nightly 31.0a1.

More reports:

Sample Stack:
0 	mozjs.dll 	js::Nursery::MinorGCCallback(JSTracer *,void * *,JSGCTraceKind) 	js/src/gc/Nursery.cpp
1 	mozjs.dll 	MarkValueInternal 	js/src/gc/Marking.cpp
2 	mozjs.dll 	js::gc::StoreBuffer::MonoTypeBuffer<js::gc::StoreBuffer::SlotEdge>::mark(js::gc::StoreBuffer *,JSTracer *) 	js/src/gc/StoreBuffer.cpp
3 	mozjs.dll 	js::Nursery::collect(JSRuntime *,JS::gcreason::Reason,js::Vector<js::types::TypeObject *,0,js::SystemAllocPolicy> *) 	js/src/gc/Nursery.cpp
4 	mozjs.dll 	Collect 	js/src/jsgc.cpp
5 	mozjs.dll 	js::GCSlice(JSRuntime *,js::JSGCInvocationKind,JS::gcreason::Reason,__int64) 	js/src/jsgc.cpp
6 	mozjs.dll 	JS::NotifyDidPaint(JSRuntime *) 	js/src/jsfriendapi.cpp
7 	xul.dll 	nsXPConnect::NotifyDidPaint() 	js/xpconnect/src/nsXPConnect.cpp
8 	xul.dll 	PresShell::DidPaintWindow() 	layout/base/nsPresShell.cpp
9 	xul.dll 	nsViewManager::DidPaintWindow() 	view/src/nsViewManager.cpp
10 	xul.dll 	nsView::DidPaintWindow() 	view/src/nsView.cpp
11 	xul.dll 	nsWindow::OnPaint(HDC__ *,unsigned int) 	widget/windows/nsWindowGfx.cpp
12 	xul.dll 	nsWindow::ProcessMessage(unsigned int,unsigned int &,long &,long *) 	widget/windows/nsWindow.cpp
13 	xul.dll 	nsWindow::WindowProcInternal(HWND__ *,unsigned int,unsigned int,long) 	widget/windows/nsWindow.cpp
14 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp
15 	xul.dll 	nsWindow::WindowProc(HWND__ *,unsigned int,unsigned int,long) 	widget/windows/nsWindow.cpp
16 	user32.dll 	InternalCallWinProc 	
17 	user32.dll 	GetRealWindowOwner 	
18 	user32.dll 	DispatchClientMessage 	
19 	user32.dll 	__fnDWORD 	
20 	ntdll.dll 	KiUserCallbackDispatcher 	
21 	ntdll.dll 	KiUserApcDispatcher 	
22 	user32.dll 	DispatchMessageW 	
23 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool) 	widget/windows/nsAppShell.cpp
24 	xul.dll 	nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal *,bool,unsigned int) 	widget/xpwidgets/nsBaseAppShell.cpp
25 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
26 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	xpcom/glue/nsThreadUtils.cpp
27 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
28 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/
29 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/
30 	xul.dll 	nsBaseAppShell::Run() 	widget/xpwidgets/nsBaseAppShell.cpp
31 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
32 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
33 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
34 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
35 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
36 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
37 	firefox.exe 	NS_internal_main(int,char * *) 	browser/app/nsBrowserApp.cpp
38 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
39 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt_bld/self_x86/crt/src/crtexe.c
40 	kernel32.dll 	BaseThreadInitThunk 	
41 	ntdll.dll 	__RtlUserThreadStart 	
42 	ntdll.dll 	_RtlUserThreadStart
Flags: needinfo?(terrence)
There isn't very much useful info in the crash comments, unable to reproduce.
This is crashing within moveToTenured, which is inlined into MinorGCCallback:

void *
js::Nursery::moveToTenured(MinorCollectionTracer *trc, JSObject *src)
    Zone *zone = src->zone();
    AllocKind dstKind = GetObjectAllocKindForCopy(trc->runtime, src);

The generated assembly is:

6A852A9A  mov         eax,dword ptr [esi]  ; eax <- src->_shape
6A852A9C  mov         ebx,ecx              ; ebx <- trc
6A852A9E  mov         edx,dword ptr [ebx]  ; edx <- trc->runtime
6A852AA0  and         eax,0FFFFF000h       ; eax <- src->_shape->arenaHeader()
 ; at this point:
 ; ecx = ebx = trc
 ; eax = 0
 ; edx = rt
 ; esi = 0x06e02940 (src?)
6A852AA5  mov         ecx,dword ptr [eax]  ; ecx <- eax->zone

So what's happening is that it's trying to get the zone out of a JSObject*. It dereferences the JSObject* and reads out the shape_ and ANDs that with 0xfffff000 to get to its ArenaHeader where it will read out the zone. Except that the Shape* after ANDing is 0x0. Which probably means it was 0x0 before ANDing as well.

In other words, we have a corrupt JSObject with a NULL shape_ field.
Note that frame 3 of the stack points to the wrong line number in js::Nursery::collect (both in the stack in comment 1 and the VS when viewing the minidump.) But I assume the call to StoreBuffer::MonoTypeBuffer<js::gc::StoreBuffer::SlotEdge>::mark tells where it is, really.
The comments all point to soundcloud. So perhaps it is related to bug 992535.
Though note that for this crash, gcreason == REFRESH_FRAME, not a full store buffer.
I can also confirm that it happened to me while SoundCloud was playing (without further user interaction); I’m unable to reproduce it either though.
There is a new signature in the crashstats from today (with yesterday's SlotEdge fix), but it is a WholeObjectEdge, so a different bug than the one fixed here.
Flags: needinfo?(terrence)
(In reply to Terrence Cole [:terrence] from comment #8)
> There is a new signature in the crashstats from today (with yesterday's
> SlotEdge fix), but it is a WholeObjectEdge, so a different bug than the one
> fixed here.

Is this new signature reported already?
As with all GC crashes, these are happening long, long after the actual point of failure. Which particular site we crash at may or may not have anything to do with the actual failure.
Blocks: 994589
Since bug 992535 landed, we have one crash with this signature on the 10 Apr build at an attempt to deref 0x2b2b2000. Maybe a swept nursery pointer masked to get the arena? I'm not sure why we'd be marking one of that, but I can't think of another reason we might have the masking here. This is still coming straight out of the slots buffer, which would indicate we may still have a bug similar to bug 992535 floating around somewhere, just a bit harder to get to.
This is no longer a topcrash for 31. There is one crash from the 20140422030204 build and one from 20140418030202. Crashes dropped off dramatically after 2014040803 which is where the fix for bug 992535 landed. Just confirming what Terence said!

I'm not sure that means the bug should be closed. Juan, do you think that taking the "topcrash" keywords would be the right thing to do? If so, do we still need to track it?
Flags: needinfo?(jbecerra)
Flags: needinfo?(jbecerra)
As Liz proposed (cf comment 12), I am going to untrack it for 31.
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> As Liz proposed (cf comment 12), I am going to untrack it for 31.

We should only do this if we are really sure this is solved. For one thing, GGC crashes can all easily be security issues and for the other, we paid dearly for early untracking in bug 970483.
As long as we don't remove the topcrash keyword and the status flag this will remain on QA's radar. That said, crashes which can't be categorically verified fixed should have SLA for tracking. It was 3.5 weeks between Liz declaring this significantly dropped and and Sylvestre untracking it. If that's not enough time to pass then what is? Should it ever be okay to untrack an unverifiable topcrash? Perhaps this bug isn't the right place to have that discussion though.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #15)
> As long as we don't remove the topcrash keyword and the status flag this
> will remain on QA's radar.

What we figured with bug 970483 is that we never should take bugs off the radar just for decline in volume without having a dev assessment of the issue. But I agree, this isn't the best place to have this discussion.
Crash Signature: [@ js::Nursery::MinorGCCallback(JSTracer*, void**, JSGCTraceKind)] → [@ js::Nursery::MinorGCCallback(JSTracer*, void**, JSGCTraceKind)] [@ js::Nursery::MinorGCCallback]
No crash reports since FF39 so closing.
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.