Closed Bug 1094953 Opened 9 years ago Closed 9 years ago

crash in js::gc::GCRuntime::maybeGC(JS::Zone*)

Categories

(Core :: JavaScript: GC, defect)

36 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- affected
firefox38 --- fixed

People

(Reporter: jbecerra, Assigned: bobowen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-1fca1c1e-ccc1-495d-9bf8-9f8c92141104.
=============================================================

This is being reported mostly on Win7 and Win8.1 installations. There are a couple of comments by the same person indicating that moving the color picker add-on has crashed the Nightly more than once. 

"All I done was to open customize then try to move the color picker addon that I had only recently added to my new FireFox Nightly. I've had nightly and Aurora plus other versions but last time I had nightly I was always having problems with Ghostery and anonymox"

I haven't tried to reproduce this.

More reports at: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AmaybeGC%28JS%3A%3AZone%2A%29

0 	xul.dll 	js::gc::GCRuntime::maybeGC(JS::Zone*) 	js/src/jsgc.cpp
1 	xul.dll 	mozilla::dom::AutoEntryScript::~AutoEntryScript() 	dom/base/ScriptSettings.cpp
2 	xul.dll 	nsXPCWrappedJSClass::DelegatedQueryInterface(nsXPCWrappedJS*, nsID const&, void**) 	js/xpconnect/src/XPCWrappedJSClass.cpp
3 	xul.dll 	nsXPCWrappedJS::QueryInterface(nsID const&, void**) 	js/xpconnect/src/XPCWrappedJS.cpp
4 	xul.dll 	nsWeakReference::QueryReferent(nsID const&, void**) 	xpcom/glue/nsWeakReference.cpp
5 	xul.dll 	nsQueryReferent::operator()(nsID const&, void**) 	xpcom/glue/nsWeakReference.cpp
6 	xul.dll 	nsObserverList::FillObserverArray(nsCOMArray<nsIObserver>&) 	xpcom/ds/nsObserverList.cpp
7 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverList.cpp
8 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverService.cpp
9 	xul.dll 	nsContentSink::NotifyDocElementCreated(nsIDocument*) 	dom/base/nsContentSink.cpp
10 	xul.dll 	nsDocElementCreatedNotificationRunner::Run() 	dom/base/nsDocElementCreatedNotificationRunner.h
11 	xul.dll 	nsHTMLDocument::EndUpdate(unsigned int) 	dom/html/nsHTMLDocument.cpp
12 	xul.dll 	nsHtml5DocumentBuilder::EndDocUpdate() 	parser/html/nsHtml5DocumentBuilder.h
13 	xul.dll 	nsHtml5DocumentBuilder::UpdateStyleSheet(nsIContent*) 	parser/html/nsHtml5DocumentBuilder.cpp
14 	xul.dll 	nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) 	parser/html/nsHtml5TreeOperation.cpp
15 	xul.dll 	nsHtml5TreeOpExecutor::RunFlushLoop() 	parser/html/nsHtml5TreeOpExecutor.cpp
16 	xul.dll 	nsHtml5ExecutorFlusher::Run() 	parser/html/nsHtml5StreamParser.cpp
17 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
18 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp
19 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
20 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
21 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
22 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp
23 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
24 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
25 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
26 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
27 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
28 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
29 	firefox.exe 	NS_internal_main(int, char**) 	browser/app/nsBrowserApp.cpp
30 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
31 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt/crtw32/startup/crt0.c:255
32 	kernel32.dll 	BaseThreadInitThunk 	
33 	ntdll.dll 	__RtlUserThreadStart 	
34 	ntdll.dll 	_RtlUserThreadStart
x86 and x64 got different stacks due to inlining, but it's the same issue: |cx->zone()| is null in JS_MaybeGC.
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()]
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()] [@ JS_MaybeGC(JSContext*) ]
> x86 and x64 got different stacks due to inlining, but it's the same issue:
> |cx->zone()| is null in JS_MaybeGC.

I had put bholley on CC because of the duplicate, but maybe this is bug should be Terrence's team?

Adding the three signatures this is roughly #18 crash on 36 nightly. Started semi-spiking around 20141020030200 or 20141021030208.
Terrence, any idea why the null cx->zone()?

Wish I had better data about particular BuildIDs for you. There were a couple of upticks but mostly this signature just crept up over time.
Flags: needinfo?(terrence)
I don't think we've changed our usage of zone in maybeGC recently, so my money is on the context<->zone/compartment link and some change there.

From [1] it seems like we set cx->zone() when we enter the compartment if the compartment zone is set. But JSCompartment::zone_ is only ever set at [2], during initialization -- it can't change or be nullptr, JS::NewCompartment will return nullptr if it is (modulo compiler bugs). So, [2] and NewCompartment makes me pretty sure that comp->zone() != null is an invariant, but [1] very clearly makes it seem like that nullptr is an option. That seems extremely fishy to me. At the very least we should buttress this with some assertions.

That said, we're still pulling the zone off the context, and there is a way for that to be null: if we have not called setCompartment. Ignoring compiler failures (I guess this is windows only though), I think this is the only possible way for cx->zone() == nullptr. I'll try to get some assertions on this tomorrow, but I don't have high hope that that will turn anything up. Maybe Bobby will have some ideas from the browser side we can look at too?

1 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscntxtinlines.h#439
2 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscompartment.cpp#42
Flags: needinfo?(terrence) → needinfo?(bobbyholley)
My guess is that nativeGlobal->GetGlobalJSObject() is null in nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code in AutoEntryScript will then pass nativeGlobal on to AutoJSAPI::InitInternal, which will then do:

mAutoNullableCompartment.emplace(mCx, global);

The short-term fix is probably to NS_ENSURE_TRUE(nativeGlobal->GetGlobalJSObject(), NS_ERROR_FAILURE) in DelegatedQueryInterface.

The proper fix is to switch AutoEntryScript to a fallible API like AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure where that went. Jim?
Flags: needinfo?(bobbyholley) → needinfo?(jimb)
Just ran into this on OS X nightly with e10s; the content process crashed just as I clicked on a link. https://crash-stats.mozilla.com/report/index/e47b20c2-f26a-4b34-9fd1-e5f252141216
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6)
> The proper fix is to switch AutoEntryScript to a fallible API like
> AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure
> where that went. Jim?

I've just been buried. I'll try to bring this effort back to life after vacation (Jan 2nd).
Flags: needinfo?(jimb)
Assignee: nobody → jimb
I crashed with same sig and at least partly same stack with e10s enabled 
bp-3152001b-96d7-4a3d-a265-be9d62150125

Unknown if this is related - several hours prior to submitting the crash all tabs have "Tab crashed...We tried to display this Web page, but it's not responding."
#3 top crasher for all processes, #1 top crashes in content. Any progress on this?
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6)
> My guess is that nativeGlobal->GetGlobalJSObject() is null in
> nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code
> in AutoEntryScript will then pass nativeGlobal on to
> AutoJSAPI::InitInternal, which will then do:
> 
> mAutoNullableCompartment.emplace(mCx, global);

If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the null compartment and have a null zone in GCRuntime::maybeGC, which would cause this crash.

So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix, given that it will be functionally the same as the fallible AutoEntryScript.

However, given that we are deriving nativeGlobal from the JS global [1] in the first place I don't see quite how that is happening.
Sorry for the XPConnect question at the moment.

[1] https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJSClass.cpp#511
Flags: needinfo?(bobbyholley)
(In reply to Bob Owen (:bobowen) from comment #11)
> (In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect
> things) from comment #6)
> > My guess is that nativeGlobal->GetGlobalJSObject() is null in
> > nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code
> > in AutoEntryScript will then pass nativeGlobal on to
> > AutoJSAPI::InitInternal, which will then do:
> > 
> > mAutoNullableCompartment.emplace(mCx, global);
> 
> If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the
> null compartment and have a null zone in GCRuntime::maybeGC, which would
> cause this crash.
> 
> So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix,
> given that it will be functionally the same as the fallible AutoEntryScript.
> 
> However, given that we are deriving nativeGlobal from the JS global [1] in
> the first place I don't see quite how that is happening.
> Sorry for the XPConnect question at the moment.
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/
> XPCWrappedJSClass.cpp#511

IIRC some of the ContentFrameMessageManager stuff could return null during teardown from GetGlobalJSObject() even if the object was still live. Maybe the same with Window too? Not sure, but don't have time to dig into it right now, sorry.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #12)

> IIRC some of the ContentFrameMessageManager stuff could return null during
> teardown from GetGlobalJSObject() even if the object was still live. Maybe
> the same with Window too? Not sure, but don't have time to dig into it right
> now, sorry.

Thanks Bobby.
TabChildGlobal::GetGlobalJSObject looks like it is the culprit here.
It actually gets its JS global from its member mTabChild [1].
I'm guessing there are circumstances where cycle collection has occurred at [2], but something is still trying to use the global.
The link still exists from the JS to the native global, but not the other way round and [1] returns null.

I dropped a conditional break point on [1] and managed to catch this a couple of times in opt and debug.
The debug builds assert earlier in the AutoJSAPI constructor, but the opt builds give this bug's crash.

I've also added a check for |nativeGlobal| as this is what will happen in the fallible Init once we have that.

Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=350066239c54

[1] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3618
[2] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3532
Attachment #8560905 - Flags: review?(bobbyholley)
Comment on attachment 8560905 [details] [diff] [review]
Ensure that GetGlobalJSObject on the native global does not return null in nsXPCWrappedJSClass::DelegatedQueryInterface.

Review of attachment 8560905 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the detailed investigation! Glad we got this sorted out. Fallible AutoEntryScript constructors cannot come soon enough.
Attachment #8560905 - Flags: review?(bobbyholley) → review+
Assignee: jimb → bobowen.code
https://hg.mozilla.org/mozilla-central/rev/46627b312b8d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.