Closed
Bug 1094953
Opened 10 years ago
Closed 10 years ago
crash in js::gc::GCRuntime::maybeGC(JS::Zone*)
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla38
People
(Reporter: jbecerra, Assigned: bobowen)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
This bug was filed from the Socorro interface and is
report bp-1fca1c1e-ccc1-495d-9bf8-9f8c92141104.
=============================================================
This is being reported mostly on Win7 and Win8.1 installations. There are a couple of comments by the same person indicating that moving the color picker add-on has crashed the Nightly more than once.
"All I done was to open customize then try to move the color picker addon that I had only recently added to my new FireFox Nightly. I've had nightly and Aurora plus other versions but last time I had nightly I was always having problems with Ghostery and anonymox"
I haven't tried to reproduce this.
More reports at: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AmaybeGC%28JS%3A%3AZone%2A%29
0 xul.dll js::gc::GCRuntime::maybeGC(JS::Zone*) js/src/jsgc.cpp
1 xul.dll mozilla::dom::AutoEntryScript::~AutoEntryScript() dom/base/ScriptSettings.cpp
2 xul.dll nsXPCWrappedJSClass::DelegatedQueryInterface(nsXPCWrappedJS*, nsID const&, void**) js/xpconnect/src/XPCWrappedJSClass.cpp
3 xul.dll nsXPCWrappedJS::QueryInterface(nsID const&, void**) js/xpconnect/src/XPCWrappedJS.cpp
4 xul.dll nsWeakReference::QueryReferent(nsID const&, void**) xpcom/glue/nsWeakReference.cpp
5 xul.dll nsQueryReferent::operator()(nsID const&, void**) xpcom/glue/nsWeakReference.cpp
6 xul.dll nsObserverList::FillObserverArray(nsCOMArray<nsIObserver>&) xpcom/ds/nsObserverList.cpp
7 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverList.cpp
8 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) xpcom/ds/nsObserverService.cpp
9 xul.dll nsContentSink::NotifyDocElementCreated(nsIDocument*) dom/base/nsContentSink.cpp
10 xul.dll nsDocElementCreatedNotificationRunner::Run() dom/base/nsDocElementCreatedNotificationRunner.h
11 xul.dll nsHTMLDocument::EndUpdate(unsigned int) dom/html/nsHTMLDocument.cpp
12 xul.dll nsHtml5DocumentBuilder::EndDocUpdate() parser/html/nsHtml5DocumentBuilder.h
13 xul.dll nsHtml5DocumentBuilder::UpdateStyleSheet(nsIContent*) parser/html/nsHtml5DocumentBuilder.cpp
14 xul.dll nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) parser/html/nsHtml5TreeOperation.cpp
15 xul.dll nsHtml5TreeOpExecutor::RunFlushLoop() parser/html/nsHtml5TreeOpExecutor.cpp
16 xul.dll nsHtml5ExecutorFlusher::Run() parser/html/nsHtml5StreamParser.cpp
17 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
18 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp
19 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
20 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc
21 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc
22 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp
23 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp
24 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp
25 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp
26 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp
27 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp
28 firefox.exe do_main browser/app/nsBrowserApp.cpp
29 firefox.exe NS_internal_main(int, char**) browser/app/nsBrowserApp.cpp
30 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp
31 firefox.exe __tmainCRTStartup f:/dd/vctools/crt/crtw32/startup/crt0.c:255
32 kernel32.dll BaseThreadInitThunk
33 ntdll.dll __RtlUserThreadStart
34 ntdll.dll _RtlUserThreadStart
x86 and x64 got different stacks due to inlining, but it's the same issue: |cx->zone()| is null in JS_MaybeGC.
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)]
[@ mozilla::dom::AutoEntryScript::~AutoEntryScript()]
Crash Signature: [@ js::gc::GCRuntime::maybeGC(JS::Zone*)]
[@ mozilla::dom::AutoEntryScript::~AutoEntryScript()] → [@ js::gc::GCRuntime::maybeGC(JS::Zone*)]
[@ mozilla::dom::AutoEntryScript::~AutoEntryScript()]
[@ JS_MaybeGC(JSContext*) ]
> x86 and x64 got different stacks due to inlining, but it's the same issue:
> |cx->zone()| is null in JS_MaybeGC.
I had put bholley on CC because of the duplicate, but maybe this is bug should be Terrence's team?
Adding the three signatures this is roughly #18 crash on 36 nightly. Started semi-spiking around 20141020030200 or 20141021030208.
Terrence, any idea why the null cx->zone()?
Wish I had better data about particular BuildIDs for you. There were a couple of upticks but mostly this signature just crept up over time.
Flags: needinfo?(terrence)
Comment 5•10 years ago
|
||
I don't think we've changed our usage of zone in maybeGC recently, so my money is on the context<->zone/compartment link and some change there.
From [1] it seems like we set cx->zone() when we enter the compartment if the compartment zone is set. But JSCompartment::zone_ is only ever set at [2], during initialization -- it can't change or be nullptr, JS::NewCompartment will return nullptr if it is (modulo compiler bugs). So, [2] and NewCompartment makes me pretty sure that comp->zone() != null is an invariant, but [1] very clearly makes it seem like that nullptr is an option. That seems extremely fishy to me. At the very least we should buttress this with some assertions.
That said, we're still pulling the zone off the context, and there is a way for that to be null: if we have not called setCompartment. Ignoring compiler failures (I guess this is windows only though), I think this is the only possible way for cx->zone() == nullptr. I'll try to get some assertions on this tomorrow, but I don't have high hope that that will turn anything up. Maybe Bobby will have some ideas from the browser side we can look at too?
1 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscntxtinlines.h#439
2 - http://dxr.mozilla.org/mozilla-central/source/js/src/jscompartment.cpp#42
Flags: needinfo?(terrence) → needinfo?(bobbyholley)
Comment 6•10 years ago
|
||
My guess is that nativeGlobal->GetGlobalJSObject() is null in nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code in AutoEntryScript will then pass nativeGlobal on to AutoJSAPI::InitInternal, which will then do:
mAutoNullableCompartment.emplace(mCx, global);
The short-term fix is probably to NS_ENSURE_TRUE(nativeGlobal->GetGlobalJSObject(), NS_ERROR_FAILURE) in DelegatedQueryInterface.
The proper fix is to switch AutoEntryScript to a fallible API like AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure where that went. Jim?
Flags: needinfo?(bobbyholley) → needinfo?(jimb)
Comment 7•10 years ago
|
||
Just ran into this on OS X nightly with e10s; the content process crashed just as I clicked on a link. https://crash-stats.mozilla.com/report/index/e47b20c2-f26a-4b34-9fd1-e5f252141216
Comment 8•10 years ago
|
||
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6)
> The proper fix is to switch AutoEntryScript to a fallible API like
> AutoJSAPI. jimb was supposedly doing that many months ago, but I'm not sure
> where that went. Jim?
I've just been buried. I'll try to bring this effort back to life after vacation (Jan 2nd).
Flags: needinfo?(jimb)
Updated•10 years ago
|
Assignee: nobody → jimb
Comment 9•10 years ago
|
||
I crashed with same sig and at least partly same stack with e10s enabled
bp-3152001b-96d7-4a3d-a265-be9d62150125
Unknown if this is related - several hours prior to submitting the crash all tabs have "Tab crashed...We tried to display this Web page, but it's not responding."
Comment 10•10 years ago
|
||
#3 top crasher for all processes, #1 top crashes in content. Any progress on this?
Assignee | ||
Comment 11•10 years ago
|
||
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #6)
> My guess is that nativeGlobal->GetGlobalJSObject() is null in
> nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code
> in AutoEntryScript will then pass nativeGlobal on to
> AutoJSAPI::InitInternal, which will then do:
>
> mAutoNullableCompartment.emplace(mCx, global);
If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the null compartment and have a null zone in GCRuntime::maybeGC, which would cause this crash.
So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix, given that it will be functionally the same as the fallible AutoEntryScript.
However, given that we are deriving nativeGlobal from the JS global [1] in the first place I don't see quite how that is happening.
Sorry for the XPConnect question at the moment.
[1] https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJSClass.cpp#511
Flags: needinfo?(bobbyholley)
Comment 12•10 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #11)
> (In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect
> things) from comment #6)
> > My guess is that nativeGlobal->GetGlobalJSObject() is null in
> > nsXPCWrappedJSClass::DelegatedQueryInterface. I'm pretty sure that the code
> > in AutoEntryScript will then pass nativeGlobal on to
> > AutoJSAPI::InitInternal, which will then do:
> >
> > mAutoNullableCompartment.emplace(mCx, global);
>
> If nativeGlobal->GetGlobalJSObject() is null, then we will end up in the
> null compartment and have a null zone in GCRuntime::maybeGC, which would
> cause this crash.
>
> So, I thought I'd try dropping in the NS_ENSURE_TRUE as a short term fix,
> given that it will be functionally the same as the fallible AutoEntryScript.
>
> However, given that we are deriving nativeGlobal from the JS global [1] in
> the first place I don't see quite how that is happening.
> Sorry for the XPConnect question at the moment.
>
> [1]
> https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/
> XPCWrappedJSClass.cpp#511
IIRC some of the ContentFrameMessageManager stuff could return null during teardown from GetGlobalJSObject() even if the object was still live. Maybe the same with Window too? Not sure, but don't have time to dig into it right now, sorry.
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #12)
> IIRC some of the ContentFrameMessageManager stuff could return null during
> teardown from GetGlobalJSObject() even if the object was still live. Maybe
> the same with Window too? Not sure, but don't have time to dig into it right
> now, sorry.
Thanks Bobby.
TabChildGlobal::GetGlobalJSObject looks like it is the culprit here.
It actually gets its JS global from its member mTabChild [1].
I'm guessing there are circumstances where cycle collection has occurred at [2], but something is still trying to use the global.
The link still exists from the JS to the native global, but not the other way round and [1] returns null.
I dropped a conditional break point on [1] and managed to catch this a couple of times in opt and debug.
The debug builds assert earlier in the AutoJSAPI constructor, but the opt builds give this bug's crash.
I've also added a check for |nativeGlobal| as this is what will happen in the fallible Init once we have that.
Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=350066239c54
[1] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3618
[2] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#3532
Attachment #8560905 -
Flags: review?(bobbyholley)
Comment 14•10 years ago
|
||
awesome |
Comment on attachment 8560905 [details] [diff] [review]
Ensure that GetGlobalJSObject on the native global does not return null in nsXPCWrappedJSClass::DelegatedQueryInterface.
Review of attachment 8560905 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for the detailed investigation! Glad we got this sorted out. Fallible AutoEntryScript constructors cannot come soon enough.
Attachment #8560905 -
Flags: review?(bobbyholley) → review+
Assignee | ||
Comment 15•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Assignee: jimb → bobowen.code
Comment 16•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox38:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in
before you can comment on or make changes to this bug.
Description
•