Closed
Bug 1384544
Opened 7 years ago
Closed 6 years ago
Crash in NewFunctionClone
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla59
People
(Reporter: calixte, Assigned: tcampbell)
References
(Blocks 1 open bug)
Details
(5 keywords, Whiteboard: [adv-main58+][post-critsmash-triage])
Crash Data
Attachments
(1 file)
1.88 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-3a20697d-b17d-4dbf-937b-dd9280170726. ============================================================= There are 18 crashes in nightly 56 and and 26 in beta 55. :kmag, could you investigate please ?
Flags: needinfo?(kmaglione+bmo)
Comment 1•7 years ago
|
||
This signature seems to be spiking in 55.0b12.
Comment 2•7 years ago
|
||
#20 crash in beta 56. This is often a startup crash. Andrew, can you help find someone to take a look at this in case it is fixable for 56? Thanks.
Flags: needinfo?(overholt)
Keywords: regression
Comment 3•7 years ago
|
||
This code changed in bug 1337578 so maybe jonco has an idea?
Flags: needinfo?(overholt) → needinfo?(jcoppeard)
Updated•7 years ago
|
Updated•7 years ago
|
status-firefox57:
--- → affected
Comment 4•7 years ago
|
||
98% of crashes are reading from address 0xffffffffffffff86. The stack is: NewFunctionClone CloneFunctionReuseScript CloneFunctionObjectIfNotSingleton js::Lambda Interpret
Updated•7 years ago
|
Component: XPConnect → JavaScript Engine
Flags: needinfo?(jcoppeard)
Comment 5•7 years ago
|
||
Not a fix. Refactor to add an assertion to JSFunction::setPrefixedBoundFunctionName() that the atom is marked.
Attachment #8907708 -
Flags: review?(sphink)
Updated•7 years ago
|
Attachment #8907708 -
Flags: review?(sphink) → review+
Updated•7 years ago
|
Keywords: leave-open
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/971c6a19e3b5 Improve atom marking assertions for JSFunction's atom r=sfink
Updated•7 years ago
|
Priority: -- → P1
Updated•7 years ago
|
Assignee: nobody → jcoppeard
Comment 7•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/971c6a19e3b5
Comment 9•7 years ago
|
||
Those assertions didn't catch anything and I'm pretty much at a loss to explain what's happening here. anba, is it possible something in bug 1371593 could have caused this?
Flags: needinfo?(nihsanullah) → needinfo?(andrebargull)
Comment 10•7 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #9) > anba, is it possible something in bug 1371593 could have caused this? I don't think so. bug 1371593 only affects bound functions and those aren't cloneable [1, 2], so they shouldn't appear in NewFunctionClone. bug 1371593 also changes JSFunction::Flags::STABLE_ACROSS_CLONES, but that flag is only used in FrameIter::matchCallee [3]. More importantly the NewFunctionClone crash at address 0xffffffffffffff86 already happens in Fx55, but bug 1371593 only landed in Fx56. [1] http://searchfox.org/mozilla-central/rev/7ba03cebf1dcf3b4c40c85a4efad42a5022cf1ec/js/src/jsfun.cpp#2218 [2] http://searchfox.org/mozilla-central/rev/7ba03cebf1dcf3b4c40c85a4efad42a5022cf1ec/js/src/jsfun.cpp#2252 [3] http://searchfox.org/mozilla-central/rev/7ba03cebf1dcf3b4c40c85a4efad42a5022cf1ec/js/src/vm/Stack.cpp#1228
Flags: needinfo?(andrebargull)
Comment 11•7 years ago
|
||
Clearing needinfo. I don't think this has anything to do with anything I've worked on.
Flags: needinfo?(kmaglione+bmo)
Comment 12•7 years ago
|
||
The volume is quite high on release and beta. Jon, I understand you are a bit stuck but do you know who could help with that? Thanks By the way, esr 52 is affected too. So, not something recent; However, the spike is recent
Comment 13•7 years ago
|
||
Jan, do you have any ideas on what might be going wrong here?
Flags: needinfo?(jcoppeard) → needinfo?(jdemooij)
Comment 14•7 years ago
|
||
signature has shifted to [@ js::gc::AtomMarkingRuntime::markAtom<T>] in 57.0b6
Crash Signature: [@ NewFunctionClone] → [@ NewFunctionClone]
[@ js::gc::AtomMarkingRuntime::markAtom<T>]
Comment 15•7 years ago
|
||
I've been looking at crash dumps. The problem here is that the function we're cloning in js::Lambda is bogus - we have an object that doesn't look like a function. It *does* seem to be a valid NativeObject, but it's definitely not a JSFunction. It seems to be an object with 3 fixed slots, storing an int32 0, a string, and another int32 0. The 0xffffff86 is the JSString* type tag of the second slot - where we expect the function's atom. That means we basically have the following possibilities: (1) The bytecode is bogus, either the op (JSOP_LAMBDA) or its immediate (the index in the script's object list). (2) The script's object list contains a bogus object where we expect our JSFunction*. Either way, something about this script is bogus. Since these scripts come from the subscript loader, it might be XDR data that got encoded/decoded incorrectly, or got corrupted.
Comment 16•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #15) > It *does* seem to be a valid NativeObject, but it's definitely not a > JSFunction. It seems to be an object with 3 fixed slots, storing an int32 0, > a string, and another int32 0. This has been confusing me for a while, until I just realized this is a RegExpObject! That makes sense because there's also a (4-byte) 0x0 pointer following these 3 slots (which I thought was just uninitialized memory) but it's the RegExpObject's private slot for the RegExpShared and we only use the first 4 bytes. So some progress - we have a RegExpObject instead of our lambda JSFunction in js::Lambda.
Comment 18•7 years ago
|
||
npb, tcampbell: just FYI, see comments 15 and 16 - reminds me of some other XDR/script related memory corruption issues. Most of these crashes are on startup on Win32 - would be great if QA could help test that.
Flags: needinfo?(tcampbell)
Flags: needinfo?(nicolas.b.pierron)
Updated•7 years ago
|
Crash Signature: [@ NewFunctionClone]
[@ js::gc::AtomMarkingRuntime::markAtom<T>] → [@ NewFunctionClone]
[@ js::gc::AtomMarkingRuntime::markAtom<T>]
[@ JSContext::markAtom]
Comment 20•7 years ago
|
||
From duplicate bug 1412838: > This crash is #7 top browser crash > in B12: http://bit.ly/2lsUPN8. It was present in Beta 11 but in very small > numbers. > 68% Correlation to: Addon "followonsearch@mozilla.com" Version = 0.9.6 > [34.15% vs 17.70% if process_type = null]
Comment 21•7 years ago
|
||
EXEC crashes with wildptr addresses -> sec critical
Group: core-security
Flags: needinfo?(nihsanullah)
Flags: needinfo?(jcoppeard)
Keywords: csectype-wildptr,
sec-critical
Updated•7 years ago
|
Group: core-security → javascript-core-security
Comment 22•7 years ago
|
||
Mark, do you have any ideas how this could be related to followon search? Maybe we're caching code in a way that interacts badly with how we deploy these system addons.
Flags: needinfo?(standard8)
Comment 23•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #22) > Mark, do you have any ideas how this could be related to followon search? Unfortunately no. We did just land 0.9.6 in beta (bug 1411518) but I don't see what that would trigger. > Maybe we're caching code in a way that interacts badly with how we deploy > these system addons. I'm suspicious about the correlations - all system add-ons are shipped to all users (e.g. see about:support). AFAIK the correlation is based on what add-ons are installed and not about which ones are actually involved in the crash. So if it is correlating system add-ons in the "its installed" manner, then that's going to be bogus. If it is via what caused the stack / it is in the stack, then that might be ok.
Flags: needinfo?(standard8)
Comment 24•7 years ago
|
||
The correlation with followonsearch seems likely to be "followonsearch 0.9.6 and whatever makes this more common (or changed the signature) landed at the same time".
Updated•6 years ago
|
Flags: needinfo?(jcoppeard)
Updated•6 years ago
|
status-firefox59:
--- → affected
Assignee | ||
Comment 25•6 years ago
|
||
Looking at this issue again, it seems the problem is the JSScript has be object data. A common set of these crashes involve JSOP_LAMBDA on the stack which involves calling JSScript::getFunction() and assuming result is a normal function. The NewFunctionClone and markAtom crashes are just a fallout from this bogus data. (Based on something :jandem mentioned in another thread that I can't find..) Now on 32-bit, JSFunction::atom_ is at same offset as RegExpObject::slot[SOURCE_SLOT].tag which should always be 0xFFFFFF86 (the JS::Value tag for an encoded string) which is what we see as crash address in most of these. The RegExpObject and FunctionObject both live in the same array on a script. I'm currently looking into XDR crashes and have seen crashes indicating memory corruption. I'll add more assertions to my local experiment around these function objects. I'm not sure throwing in a RELEASE_ASSERT to js::Lambda that the object is actually a function will get as closer to the source. We would just move the crash a few call frames sooner.
Assignee | ||
Comment 26•6 years ago
|
||
Another point to make is that due to [1], if XDR data is zero (due to some default buffer or something) we will decode a lambda target as a RegExpObject and set StringValue which will lead to this crash. One thing we should do is not use zero as a valid enum value in XDR. This XDR case statement already cancels the decode on unsupported values. This doesn't identify the underlying XDR bug we are solving, but it should help reduce startup crashes by falling back to slow path when things go bad. [1] https://searchfox.org/mozilla-central/rev/c9214daa09e3eb8246b1448e469df1652a140825/js/src/jsscript.cpp#318
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → tcampbell
Flags: needinfo?(tcampbell)
Comment 27•6 years ago
|
||
For MarkAtom (only): A hint: looking at the crash data, this appears to have appeared in 55.0b6 around July 1st. (there are low-level random crashes that predate that, but it clearly first spikes in beta there, followed by a spike when 55 goes to release, and another spike when 56 goes to release). This implies at least the 55 spike is due to an uplift new in 55.0b6. That may seriously narrow down the possible causes, depending on the churn level, and perhaps provide a few suspects to look at more carefully or to land diagnostic asserts against. Also, the spike appears to cause a new address to show up: 0xffffffffffffff86. This may have been fixed in a later beta, as the signature for MarkAtom drops off (a lot, to 0-2 crashes/day)). Then, on Oct 28th, in 57b12, there's a new spike in MarkAtom (again with the unique 0xfffffffffff86), which continues into 57 release. This is far more interesting, as unlike the 55b6 crash, this on continues and gets into release. Someone needs to look at the delta from b11 (I presume, if it shipped) to b12. I *suspect* that the ongoing low-level set of crashes across all version with wildptrs or 0xfffffffffff (basically unknown address) is distinct from the bug causing the spike first in 55.0b6, and then again in 57b12 and later.
Comment 28•6 years ago
|
||
55b5->55b6 changelog: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_55_0b5_RELEASE&tochange=FIREFOX_55_0b6_RELEASE 57b11->57b12 changelog: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_57_0b11_RELEASE&tochange=FIREFOX_57_0b12_RELEASE
Assignee | ||
Comment 29•6 years ago
|
||
Thanks for the analysis. Nothing stands out from either of those changelogs. If my theory about address 0xfffffff86 holds, we may be a long way from the original source of problem (ie a previous browser session). I'm working on a patch in Bug 1418894 to turn the 0xFFFFFF86 crashes from wild-ptr crashes to assertion failures while loading the cache. This should help us determine if these crashes are almost all the result of XDR problems (errors encoding in a previous session, or a buffer / logic error in current startup). In Bug 1390856, we identify a real defect in XDR that could potentially lead to these problems, but I have doubts it is the largest contributer.
Depends on: 1418894
Assignee | ||
Comment 30•6 years ago
|
||
I'm requesting uplift of Bug 1390856 to beta to see if it has an impact on these rates. Another idea that :mconley proposed is to see what happens to XDR cache under startup or shutdown crash. For example, on a start-up crash, is it possible we write a partial cache file and then read by zeroes later. A zero byte at many places in the cache file would lead directly to this crash signature.
Updated•6 years ago
|
tracking-firefox57:
+ → ---
Comment 31•6 years ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #30) > I'm requesting uplift of Bug 1390856 to beta to see if it has an impact on > these rates. it doesn't look like there's any improvements in the rates in 58.0b6
Assignee | ||
Comment 32•6 years ago
|
||
Bug 1418894 attempts to detect XDR buffer corruption earlier and fall-back to normal script loading instead of startup cache. Based on theory in Comment 25 / Comment 26, this should greatly reduce the crash signature tracked in this bug. It does not address the underlying reasons why XDR buffer might become corrupt. That is still under investigation. I'm requesting uplift of Bug 1418894 to beta to see if it impacts these crash rates. (Clearing ni? for others since I'm actively investigating this issue)
Flags: needinfo?(nihsanullah)
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(jdemooij)
Comment 33•6 years ago
|
||
so far it looks like bug 1418894 had a very positive effect on the signatures in this bug, as they are essentially gone in 58.0b9
Assignee | ||
Comment 34•6 years ago
|
||
There are almost no crashes on 58.0b9. I'm going to say Bug 1418894 resolved this and points the root cause to XDR data corruption.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Target Milestone: --- → mozilla59
Updated•6 years ago
|
Group: javascript-core-security → core-security-release
Updated•6 years ago
|
Whiteboard: [adv-main58+]
Updated•6 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main58+] → [adv-main58+][post-critsmash-triage]
Updated•5 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•