Closed Bug 1094953 Opened 10 years ago Closed 10 years ago

crash in js::gc::GCRuntime::maybeGC(JS::Zone*)

Categories

(Core :: JavaScript: GC, defect)

36 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- affected
firefox38 --- fixed

People

(Reporter: jbecerra, Assigned: bobowen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-1fca1c1e-ccc1-495d-9bf8-9f8c92141104. ============================================================= This is being reported mostly on Win7 and Win8.1 installations. There are a couple of comments by the same person indicating that moving the color picker add-on has crashed the Nightly more than once. "All I done was to open customize then try to move the color picker addon that I had only recently added to my new FireFox Nightly. I've had nightly and Aurora plus other versions but last time I had nightly I was always having problems with Ghostery and anonymox" I haven't tried to reproduce this. More reports at: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AmaybeGC%28JS%3A%3AZone%2A%29 0 xul.dll js::gc::GCRuntime::maybeGC(JS::Zone*) js/src/jsgc.cpp 1 xul.dll mozilla::dom::AutoEntryScript::~AutoEntryScript() dom/base/ScriptSettings.cpp 2 xul.dll nsXPCWrappedJSClass::DelegatedQueryInterface(nsXPCWrappedJS*, nsID const&, void**) js/xpconnect/src/XPCWrappedJSClass.cpp 3 xul.dll nsXPCWrappedJS::QueryInterface(nsID const&, void**) js/xpconnect/src/XPCWrappedJS.cpp 4 xul.dll nsWeakReference::QueryReferent(nsID const&, void**) xpcom/glue/nsWeakReference.cpp 5 xul.dll nsQueryReferent::operator()(nsID const&, void**) xpcom/glue/nsWeakReference.cpp 6 xul.dll nsObserverList::FillObserverArray(nsCOMArray<nsIObserver>&) xpcom/ds/nsObserverList.cpp 7 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverList.cpp 8 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverService.cpp 9 xul.dll nsContentSink::NotifyDocElementCreated(nsIDocument*) dom/base/nsContentSink.cpp 10 xul.dll nsDocElementCreatedNotificationRunner::Run() dom/base/nsDocElementCreatedNotificationRunner.h 11 xul.dll nsHTMLDocument::EndUpdate(unsigned int) dom/html/nsHTMLDocument.cpp 12 xul.dll nsHtml5DocumentBuilder::EndDocUpdate() parser/html/nsHtml5DocumentBuilder.h 13 xul.dll nsHtml5DocumentBuilder::UpdateStyleSheet(nsIContent*) parser/html/nsHtml5DocumentBuilder.cpp 14 xul.dll nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) parser/html/nsHtml5TreeOperation.cpp 15 xul.dll nsHtml5TreeOpExecutor::RunFlushLoop() parser/html/nsHtml5TreeOpExecutor.cpp 16 xul.dll nsHtml5ExecutorFlusher::Run() parser/html/nsHtml5StreamParser.cpp 17 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 18 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp 19 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp 20 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 21 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 22 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp 23 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp 24 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp 25 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp 26 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp 27 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp 28 firefox.exe do_main browser/app/nsBrowserApp.cpp 29 firefox.exe NS_internal_main(int, char**) browser/app/nsBrowserApp.cpp 30 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp 31 firefox.exe __tmainCRTStartup f:/dd/vctools/crt/crtw32/startup/crt0.c:255 32 kernel32.dll BaseThreadInitThunk 33 ntdll.dll __RtlUserThreadStart 34 ntdll.dll _RtlUserThreadStart
x86 and x64 got different stacks due to inlining, but it's the same issue: |cx->zone()| is null in JS_MaybeGC.
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()]
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] [@ mozilla::dom::AutoEntryScript::~AutoEntryScript()] [@ JS_MaybeGC(JSContext*) ]
> x86 and x64 got different stacks due to inlining, but it's the same issue: > |cx->zone()| is null in JS_MaybeGC. I had put bholley on CC because of the duplicate, but maybe this is bug should be Terrence's team? Adding the three signatures this is roughly #18 crash on 36 nightly. Started semi-spiking around 20141020030200 or 20141021030208.
Terrence, any idea why the null cx->zone()? Wish I had better data about particular BuildIDs for you. There were a couple of upticks but mostly this signature just crept up over time.
Flags: needinfo?(terrence)
I don't think we've changed our usage of zone in maybeGC recently, so my money is on the context<->zone/compartment link and some change there. From [1] it seems like we set cx->zone() when we enter the compartment if the compartment zone is set. But JSCompartment::zone_ is only ever set at [2], during initialization -- it can't change or be nullptr, JS::NewCompartment will return nullptr if it is (modulo compiler bugs). So, [2] and NewCompartment makes me pretty sure that comp->zone() != null is an invariant, but [1] very clearly makes it seem like that nullptr is an option. That seems extremely fishy to me. At the very least we should buttress this with some assertions. That said, we're still pulling the zone off the context, and there is a way for that to be null: if we have not called setCompartment. Ignoring compiler failures (I guess this is windows only though), I think this is the only possible way for cx->zone() == nullptr. I'll try to get some assertions on this tomorrow, but I don't have high hope that that will turn anything up. Maybe Bobby will have some ideas from the browser side we can look at too? 1 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscntxtinlines.h#439 2 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscompartment.cpp#42
Flags: needinfo?(terrence) → needinfo?(bobbyholley)
My guess is that nativeGlobal->GetGlobalJSObject() is null in nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code in AutoEntryScript will then pass nativeGlobal on to AutoJSAPI::InitInternal, which will then do: mAutoNullableCompartment.emplace(mCx, global); The short-term fix is probably to NS_ENSURE_TRUE(nativeGlobal->GetGlobalJSObject(), NS_ERROR_FAILURE) in DelegatedQueryInterface. The proper fix is to switch AutoEntryScript to a fallible API like AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure where that went. Jim?
Flags: needinfo?(bobbyholley) → needinfo?(jimb)
Just ran into this on OS X nightly with e10s; the content process crashed just as I clicked on a link. https://crash-stats.mozilla.com/report/index/e47b20c2-f26a-4b34-9fd1-e5f252141216
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6) > The proper fix is to switch AutoEntryScript to a fallible API like > AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure > where that went. Jim? I've just been buried. I'll try to bring this effort back to life after vacation (Jan 2nd).
Flags: needinfo?(jimb)
Assignee: nobody → jimb
I crashed with same sig and at least partly same stack with e10s enabled bp-3152001b-96d7-4a3d-a265-be9d62150125 Unknown if this is related - several hours prior to submitting the crash all tabs have "Tab crashed...We tried to display this Web page, but it's not responding."
#3 top crasher for all processes, #1 top crashes in content. Any progress on this?
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6) > My guess is that nativeGlobal->GetGlobalJSObject() is null in > nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code > in AutoEntryScript will then pass nativeGlobal on to > AutoJSAPI::InitInternal, which will then do: > > mAutoNullableCompartment.emplace(mCx, global); If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the null compartment and have a null zone in GCRuntime::maybeGC, which would cause this crash. So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix, given that it will be functionally the same as the fallible AutoEntryScript. However, given that we are deriving nativeGlobal from the JS global [1] in the first place I don't see quite how that is happening. Sorry for the XPConnect question at the moment. [1] https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJSClass.cpp#511
Flags: needinfo?(bobbyholley)
(In reply to Bob Owen (:bobowen) from comment #11) > (In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect > things) from comment #6) > > My guess is that nativeGlobal->GetGlobalJSObject() is null in > > nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code > > in AutoEntryScript will then pass nativeGlobal on to > > AutoJSAPI::InitInternal, which will then do: > > > > mAutoNullableCompartment.emplace(mCx, global); > > If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the > null compartment and have a null zone in GCRuntime::maybeGC, which would > cause this crash. > > So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix, > given that it will be functionally the same as the fallible AutoEntryScript. > > However, given that we are deriving nativeGlobal from the JS global [1] in > the first place I don't see quite how that is happening. > Sorry for the XPConnect question at the moment. > > [1] > https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/ > XPCWrappedJSClass.cpp#511 IIRC some of the ContentFrameMessageManager stuff could return null during teardown from GetGlobalJSObject() even if the object was still live. Maybe the same with Window too? Not sure, but don't have time to dig into it right now, sorry.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #12) > IIRC some of the ContentFrameMessageManager stuff could return null during > teardown from GetGlobalJSObject() even if the object was still live. Maybe > the same with Window too? Not sure, but don't have time to dig into it right > now, sorry. Thanks Bobby. TabChildGlobal::GetGlobalJSObject looks like it is the culprit here. It actually gets its JS global from its member mTabChild [1]. I'm guessing there are circumstances where cycle collection has occurred at [2], but something is still trying to use the global. The link still exists from the JS to the native global, but not the other way round and [1] returns null. I dropped a conditional break point on [1] and managed to catch this a couple of times in opt and debug. The debug builds assert earlier in the AutoJSAPI constructor, but the opt builds give this bug's crash. I've also added a check for |nativeGlobal| as this is what will happen in the fallible Init once we have that. Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=350066239c54 [1] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3618 [2] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3532
Attachment #8560905 - Flags: review?(bobbyholley)
Comment on attachment 8560905 [details] [diff] [review] Ensure that GetGlobalJSObject on the native global does not return null in nsXPCWrappedJSClass::DelegatedQueryInterface. Review of attachment 8560905 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the detailed investigation! Glad we got this sorted out. Fallible AutoEntryScript constructors cannot come soon enough.
Attachment #8560905 - Flags: review?(bobbyholley) → review+
Assignee: jimb → bobowen.code
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: