Closed Bug 1412391 Opened 8 years ago Closed 5 years ago

Crash in js::shadow::Object::slotRef

Categories

(Core :: XPConnect, defect, P1)

Unspecified
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: wsmwk, Assigned: sfink)

Details

(4 keywords)

Crash Data

bp-76bb3aba-c643-490f-b13c-1f99e0171027 56.0a1 20170728100358 I wasn't at keyboard at the time. I don't crash much - maybe once per month. Previous crash was a week ago different signature bp-311849ad-3ab9-4aaf-9275-8d0480171020 0 xul.dll js::shadow::Object::slotRef(unsigned __int64) obj-firefox/dist/include/jsfriendapi.h:586 1 xul.dll xpc::JSXrayTraits::construct(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&, js::Wrapper const&) js/xpconnect/wrappers/XrayWrapper.cpp:949 2 xul.dll xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::JSXrayTraits>::construct(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/xpconnect/wrappers/XrayWrapper.cpp:2471 3 xul.dll js::Proxy::construct(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/Proxy.cpp:501 4 xul.dll js::proxy_Construct(JSContext*, unsigned int, JS::Value*) js/src/proxy/Proxy.cpp:750 5 xul.dll InternalConstruct js/src/vm/Interpreter.cpp:572 6 xul.dll js::ConstructFromStack(JSContext*, JS::CallArgs const&) js/src/vm/Interpreter.cpp:598 7 xul.dll Interpret js/src/vm/Interpreter.cpp:3056 8 xul.dll js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:409 9 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:487 10 xul.dll js::Wrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/Wrapper.cpp:169 11 xul.dll js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/CrossCompartmentWrapper.cpp:359 12 xul.dll js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/proxy/Proxy.cpp:481 13 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:451 14 xul.dll Interpret js/src/vm/Interpreter.cpp:3064 15 xul.dll js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:409 16 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:487 17 xul.dll js::CallSelfHostedFunction(JSContext*, JS::Handle<js::PropertyName*>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) js/src/vm/SelfHosting.cpp:1678 18 xul.dll AsyncFunctionResume js/src/vm/AsyncFunction.cpp:192 19 xul.dll AsyncFunctionAwaitPromiseReactionJob js/src/builtin/Promise.cpp:882 20 xul.dll PromiseReactionJob js/src/builtin/Promise.cpp:962 21 xul.dll js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) js/src/vm/Interpreter.cpp:469 22 xul.dll JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) js/src/jsapi.cpp:2948 23 xul.dll mozilla::PromiseJobRunnable::Run() xpcom/base/CycleCollectedJSContext.cpp:209 24 xul.dll mozilla::dom::Promise::PerformMicroTaskCheckpoint() dom/promise/Promise.cpp:535 25 xul.dll mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) xpcom/base/CycleCollectedJSContext.cpp:362 Not a high crash rate, in fact rare. And most are version 56[1], but not all. For example bp-984d1f6d-aad3-4b43-ad27-a3df70171023 57.0b9 bp-9351123d-9978-4f00-8f9d-bdf920171022 58.0a1 [1] https://crash-stats.mozilla.com/signature/?product=Firefox&signature=js%3A%3Ashadow%3A%3AObject%3A%3AslotRef&date=%3E%3D2017-10-20T15%3A05%3A01.000Z&date=%3C2017-10-27T15%3A05%3A01.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-version&_sort=-date&page=1
Crashes come from many different sources that call js::shadow::Object::slotRef (mostly from DOM code, such as DOMGlobalHasProtoAndIFaceCache and Unwrap*() methods. Most of the crashes are wildptrs. 1 in the last 3 months is a ...bad0badN crash (GC poison). about 10 are UAF e5 crashes. Crashes go *way* back, but maybe only for some of the callers of slotRef(). We might consider making slotRef() a non-terminal in processing crashreports, but I'll leave that thought for people who look into this (and if they think there are multiple causes depending on caller, or if this is simply a landmine laid in this routine that lots of different calling paths can trip over. some of the crashes are also nullptr's, but it's unclear if that's by chance or not.
Group: core-security
The specific crash in comment 0 is in this code in JSXrayTraits::construct: JS::RootedObject holder(cx, self.ensureHolder(cx, wrapper)); if (self.getProtoKey(holder) == JSProto_Function) { on the "if" line. getProtoKey does: int32_t key = js::GetReservedSlot(holder, SLOT_PROTOKEY).toInt32(); return static_cast<JSProtoKey>(key); and the crash is at 0x8. So "holder" was null. As in, we OOMed or so, similar to bug 1407414. The crash report does say "System Memory Use Percentage: 98" and claims only 300MB of RAM are free... Looking at the general crash reports for Object::slotRef, a bunch of them are near-null from JSXrayTraits::construct. Some are near-null from WindowBinding::Wrap but it's hard to say what source that corresponds to without doing some digging. I so wish crash-stats showed the mercurial changeset. :( I see one with near-null in ModuleEnvironmentObject::lookupImport. I do see a bunch of wild-reads under UnwrapPossiblyNotInitializedDOMObject from things like DocumentFragmentBinding::_finalize and TextBinding::_finalize and MessageEventBinding::_finalize and DataTransferBinding::_finalize and NodeListBinding::DOMProxyHandler::finalize and so on. Those are called from either CycleCollectedJSRuntime::JSObjectsTenured or FinalizeTypedArenas<JSObject>. I am going to file a new bug on the JSXrayTraits::construct crash; that one is a guaranteed near-null, I believe. I'm pretty worried about those binding finalization crashes, though. Those suggest that the pointer to the thing we're finalizing is problematic, I think.
Flags: needinfo?(sphink)
I filed bug 1414461 for the ensureHolder OOM thing.
Group: core-security → dom-core-security
I looked at a sampling of the crash reports. I saw the _finalize ones. But I also saw https://crash-stats.mozilla.com/report/index/44db6088-00fc-4d07-aebf-3e6bc0171119 that is crashing in UnwrapDOMObject<MediaQueryList> which is not being called during finalization. Are partially or uninitialized DOM objects leaking out somewhere? One potential source might be bug 1418528, where we CheckedUnwrap something, get access denied, and then use the resulting nullptr anyway. Though I don't think that's very easy to trigger. There's also https://crash-stats.mozilla.com/report/index/c360692d-7340-4811-8c06-0b9590171119 that looks like it's coming from an ObjectValue(nullptr). If I'm really lucky, the patch in bug 1366083 will smoke out some of those, though they'll have to reach a GC to be caught, and the information provided won't really be more than what we already have in this bug. I'm also doing a try push that checks for ObjectValue(nullptr) being created, though I don't expect that to catch anything here. It might be interesting to see how much slowdown it provides, and perhaps try it out on beta.
Flags: needinfo?(sphink)
> Are partially or uninitialized DOM objects leaking out somewhere? That would be pretty bizarre, but maybe... > that looks like it's coming from an ObjectValue(nullptr) Can we do a diagnostic assert (so nightly and early beta only) on ObjectValue(nullptr) to smoke those out?
(for comment 5) In the interest of having this assigned so it looks from the outside like it's progressing, I'm giving it to you, Steve. We can change that as appropriate later. Thanks for understanding :)
Assignee: nobody → sphink
Flags: needinfo?(sphink)
Priority: -- → P1
(In reply to Andrew Overholt [:overholt] from comment #6) > (for comment 5) > > In the interest of having this assigned so it looks from the outside like > it's progressing, I'm giving it to you, Steve. We can change that as > appropriate later. Thanks for understanding :) I realize this will sound snarkier than it should, but what's the purpose of looking busy here? Also, I'm quite interesting in a response to bz's question as to whether some diagnostic asserts will help us here (comment 5).
Flags: needinfo?(overholt)
(In reply to Frederik Braun [:freddyb] from comment #7) > (In reply to Andrew Overholt [:overholt] from comment #6) > > (for comment 5) > > > > In the interest of having this assigned so it looks from the outside like > > it's progressing, I'm giving it to you, Steve. We can change that as > > appropriate later. Thanks for understanding :) > > I realize this will sound snarkier than it should, but what's the purpose of > looking busy here? Oh, I was jokingly referring to the new "sec-high bugs should have an assignee" process. (Not snarky; good question)
Flags: needinfo?(overholt)
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #5) > > Are partially or uninitialized DOM objects leaking out somewhere? > > That would be pretty bizarre, but maybe... > > > that looks like it's coming from an ObjectValue(nullptr) > > Can we do a diagnostic assert (so nightly and early beta only) on > ObjectValue(nullptr) to smoke those out? I think this question is for you, Steve. Can you give us a reply?
Implemented bz's suggestion in comment 5 over in bug 1366083. Hopefully that will smoke something out.
Flags: needinfo?(sphink)
(In reply to Steve Fink [:sfink] [:s:] from comment #10) > Implemented bz's suggestion in comment 5 over in bug 1366083. Hopefully that > will smoke something out. Great! I'm not sure I understand the fix here :-) The crash signature in bug 1366083 shows no new reports, as of recently. I suppose that's great. But how does that affect the crash here? ;) What's left to be done? :-)
Flags: needinfo?(sphink)
(In reply to Frederik Braun [:freddyb] from comment #11) > (In reply to Steve Fink [:sfink] [:s:] from comment #10) > > Implemented bz's suggestion in comment 5 over in bug 1366083. Hopefully that > > will smoke something out. > > Great! I'm not sure I understand the fix here :-) > The crash signature in bug 1366083 shows no new reports, as of recently. I > suppose that's great. > > But how does that affect the crash here? ;) It doesn't. But based on previous analysis in this bug, it is hoped that the MOZ_DIAGNOSTIC_ASSERT will fire before this bug has a chance to manifest, bringing us (much) closer to the root cause. Release builds won't do the assertion, however, so we would expect this signature to still show up there. > What's left to be done? :-) Finding and fixing the bug. There's still not enough to go on. (That said, this bug has more info than many similar crashes.) On the other hand, it appears that the volume has dropped since February, if I'm reading the graph here right.
Flags: needinfo?(sphink)
> Finding and fixing the bug. There's still not enough to go on. (That said, > this bug has more info than many similar crashes.) > > On the other hand, it appears that the volume has dropped since February, if > I'm reading the graph here right. Looking at a 3-month report in crash-stats, and at the graph tab, I don't see any real change overall.
Wow, so all I found was only one report for the diagnostic assert: https://crash-stats.mozilla.com/report/index/c95c61b5-36c5-431d-af2f-e72f60180428
(In reply to Frederik Braun [:freddyb] from comment #14) > Wow, so all I found was only one report for the diagnostic assert: > https://crash-stats.mozilla.com/report/index/c95c61b5-36c5-431d-af2f- > e72f60180428 Ok, so that would be JSScript::getObject(jsbytecode*) returning nullptr. Which could be pc+1 pointing to a bad index, a corrupt object table in the script, mangled offsets that record where the various tables start, or something upstream could have written a nullptr into the table. I looked into adding a release assertion for when the objects get inserted into the table, but all of those are already guarded by nullptr checks. But I'm still not seeing other reports for the diagnostic assert, so setObject(nullptr) does not seem to be a large source of issues here.
Is this a candidate to be marked 'stalled', Steve?
Flags: needinfo?(sphink)
Sadly, yes.
Flags: needinfo?(sphink)
Keywords: stalled

I can't common on Fennec, but there are no crashes for current versions of Firefox and THunderbird in the past 6 months. https://crash-stats.mozilla.org/signature/?signature=js%3A%3Ashadow%3A%3AObject%3A%3AslotRef&date=%3E%3D2019-10-23T12%3A32%3A00.000Z&date=%3C2020-04-23T12%3A32%3A00.000Z

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.