Closed Bug 969441 Opened 10 years ago Closed 10 years ago

crash in JS_TransplantObject(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>)

Categories

(Core :: JavaScript Engine, defect)

27 Branch
x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla30
Tracking Status
firefox27 + wontfix
firefox28 + wontfix
firefox29 + verified
firefox30 --- verified

People

(Reporter: tracy, Assigned: bholley)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-66495833-362a-4136-863f-49e8f2140206.
=============================================================

This has skyrocket to #1 topcrasher Fx27 release.

Is this bug 793904 in action in high volume or something else?
reports are all but one on either Windows or Mac.  

Also this currently #13 on Fx28.
OS: Windows NT → All
Flags: needinfo?(wmccloskey)
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #0)

> Is this bug 793904 in action in high volume or something else?

Thankfully not. In that case we would have never made it to JS_TransplantObject.

This is probably the same as bug 856670 though.
(In reply to Bobby Holley (:bholley) from comment #2)

> This is probably the same as bug 856670 though.

Should we dupe and add this signature to that bug or track independently?
I think this bug is different from bug 856670. This bug doesn't appear to involve stack exhaustion, while that one did. Maybe we should add the RemapWrapper signature to this bug and close the other one.
Flags: needinfo?(wmccloskey)
> > Is this bug 793904 in action in high volume or something else?
> 
> Thankfully not. In that case we would have never made it to JS_TransplantObject.

What do you mean? We are crashing intentionally because of the code added in bug 793904. I don't know why wrap() is failing though.

All I've noticed so far is that the stacks for the recent JS_TransplantObject and RemapWrapper crashes are remarkably similar. They all involve:
1. handling for an exception
2. opening the slow script dialog
3. running a nested event loop to show the modal window
4. getting some HTTP data in
5. loading a new document and triggering JS_TransplantObject, which crashes

Setting needinfo bholley since he asked me to.
Flags: needinfo?(bobbyholley)
(In reply to Bill McCloskey (:billm) from comment #5)
> > Thankfully not. In that case we would have never made it to JS_TransplantObject.
> 
> What do you mean? We are crashing intentionally because of the code added in
> bug 793904.

Oh, sorry. I got that bug confused with bug 922009.

> I don't know why wrap() is failing though.

As discussed on IRC, this bug involves failure in swap() (the wrap() failure is bug 856670).
Flags: needinfo?(bobbyholley)
I got some help looking at the crash dumps from dmajor. It looks like we're pretty close to using up the stack in the JS_TransplantObject crashes. We looked at two crashes, and we've used up over 90% of the full 1MB stack in both.

Also, the frames below js::GetAndClearException are probably JIT code since those memory regions are marked writable.

One question is why JSObject::swap would fail due to stack exhaustion. I'm having trouble finding stack checks in there.

Overall, though, this suggests that we may have a bug in the JIT's stack overflow checking.
Why are we using so much stack in the first place?

The report in [1] has no recognizable symbols on the stack between 0x0063ba28 and 0x0070e24c -- that means it's all JIT frames, 862KB of them.

[1] https://crash-stats.mozilla.com/report/index/d52a9678-32d7-4b39-be4e-cfdc22140206
By the way, the symbols surrounding that large region are:
0063ba28  72990c38 mozjs!GetAndClearExceptionInfo
0070e24c  726bec00 mozjs!EnterBaseline+0x130

I don't know the JIT code very well, but maybe it caught its own stack problem and tried to report it?
Yeah, it's possible that the script is overflowing the stack, which causes an exception to be generated, and then the script tries to handle the exception. And upon entering the exception handler, we also trigger the slow script dialog for some reason.

I still haven't found any place where JSObject::swap does a stack overflow check. The only way it seems to fail is when it can't allocate memory. And most of these crash dumps have plenty of memory.
CCing some more people in case somebody has an idea. This is the #1 topcrash in Firefox 27.
Naveed, can you help get more eyes on this?
Flags: needinfo?(nihsanullah)
Is this really (still) a topcrash? On crash-stats I see it at #94 for Firefox 27... Am I doing something wrong?
It was high up in the early days of 27.0 but seems to have trickled down in ranks. The question is if it could flare up due to some website(s) triggering it - we did get burned by something that went lower relative volume recently (that's what caused the 27.0.1 release) so we are more careful with de-prioritizing now.
jorendorff: It would help if we can just answer the question "Can JSObject::swap call js_ReportOverRecursed?" statically.

Which js_ReportOverRecursed do you mean?

  #48273 uint8 JSObject::swap(JSContext*, class JS::Handle<JSObject*>, class JS::Handle<JSObject*>)
  #58155 uint8 JSObject::ReserveForTradeGuts(JSContext*, JSObject*, JSObject*, JSObject::TradeGutsReserved*)
  #13501 uint8 js::SetClassAndProto(JSContext*, class JS::Handle<JSObject*>, js::Class*, class JS::Handle<js::TaggedProto>, uint8*)
  #49488 uint8 JSObject::splicePrototype(JSContext*, js::Class*, class JS::Handle<js::TaggedProto>)
  #13290 js::types::TypeObject* JSObject::getType(JSContext*)
  #13489 js::types::TypeObject* JSObject::makeLazyType(JSContext*, class JS::Handle<JSObject*>)
  #9192 JSScript* JSFunction::getOrCreateScript(JSContext*)
  #9195 uint8 JSFunction::createScriptForLazilyInterpretedFunction(JSContext*, class JS::Handle<JSFunction*>)
  #38639 JSScript* js::CloneScript(JSContext*, class JS::Handle<JSObject*>, class JS::Handle<JSFunction*>, const class JS::Handle<JSScript*>, uint32)
  #11456 uint8 JSCompartment::wrap(JSContext*, class JS::MutableHandle<JSObject*>, class JS::Handle<JSObject*>)
  #11363 void js_ReportOverRecursed(JSContext*)

or

  #48273 uint8 JSObject::swap(JSContext*, class JS::Handle<JSObject*>, class JS::Handle<JSObject*>)
  #58155 uint8 JSObject::ReserveForTradeGuts(JSContext*, JSObject*, JSObject*, JSObject::TradeGutsReserved*)
  #13501 uint8 js::SetClassAndProto(JSContext*, class JS::Handle<JSObject*>, js::Class*, class JS::Handle<js::TaggedProto>, uint8*)
  #49488 uint8 JSObject::splicePrototype(JSContext*, js::Class*, class JS::Handle<js::TaggedProto>)
  #13290 js::types::TypeObject* JSObject::getType(JSContext*)
  #13489 js::types::TypeObject* JSObject::makeLazyType(JSContext*, class JS::Handle<JSObject*>)
  #9192 JSScript* JSFunction::getOrCreateScript(JSContext*)
  #9195 uint8 JSFunction::createScriptForLazilyInterpretedFunction(JSContext*, class JS::Handle<JSFunction*>)
  #32105 uint8 js::frontend::CompileLazyFunction(JSContext*, class JS::Handle<js::LazyScript*>, uint16*, uint64)
  #32120 uint8 js::frontend::EmitFunctionScript(js::ExclusiveContext*, js::frontend::BytecodeEmitter*, js::frontend::ParseNode*)
  #32073 uint8 js::frontend::EmitTree(js::ExclusiveContext*, js::frontend::BytecodeEmitter*, js::frontend::ParseNode*)
  #17945 void js_ReportOverRecursed(js::ThreadSafeContext*)

? Ignoring the numbers at the beginnings of the lines, those are both
possible call sequences.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #14)
> It was high up in the early days of 27.0 but seems to have trickled down in
> ranks.

Judging by <https://crash-stats.mozilla.com/report/list?product=Firefox&signature=JS_TransplantObject%28JSContext*%2C+JS%3A%3AHandle%3CJSObject*%3E%2C+JS%3A%3AHandle%3CJSObject*%3E%29#tab-graph> it's spiky and the spikes have not stopped happening.

(It looks like time is reversed on that graph. What.)
Flags: needinfo?(jorendorff)
I was wondering yesterday if we could expose JS_TransplantObject to the shell, to get fuzzing etc. It sounds scary, and the shell function would need a bunch of checks to avoid triggering various asserts, but maybe it's doable.
There's a patch to expose transplant to fuzzing in bug 863847. I never got around to landing it though. I'd be fine landing it if someone will commit to fixing the bugs that get filed; I don't have time right now.

Based on the stacks Steve posted, it seems somewhat likely that we're getting a stack overflow here. The easiest way to fix it is probably to fail early, at the top of JS_TransplantObject. We'd just need to stick a stack overflow check there. I'm wondering whether Gecko/xpconnect will handle failure gracefully there. Can you answer that Bobby? I think SetNewDocument is the main callpath to worry about.
(In reply to Bill McCloskey (:billm) from comment #18)
> Based on the stacks Steve posted, it seems somewhat likely that we're
> getting a stack overflow here. The easiest way to fix it is probably to fail
> early, at the top of JS_TransplantObject. We'd just need to stick a stack
> overflow check there. I'm wondering whether Gecko/xpconnect will handle
> failure gracefully there. Can you answer that Bobby? I think SetNewDocument
> is the main callpath to worry about.

Well, ReparentWrapper is also a dicey case.

If this is just a stack-limit check we're concerned about here, I think it makes more sense to expose a way for Gecko to do the stack check, and have it perform that check at the very top of any function that invokes xpc::TransplantWrapper. Once we get to the actually transplant call, it's usually too late to recover gracefully, so we MOZ_CRASH.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) from comment #20)
> If this is just a stack-limit check we're concerned about here, I think it
> makes more sense to expose a way for Gecko to do the stack check, and have
> it perform that check at the very top of any function that invokes
> xpc::TransplantWrapper. Once we get to the actually transplant call, it's
> usually too late to recover gracefully, so we MOZ_CRASH.

All the stack checking is in jsfriendapi.h, so this should be no problem:
http://mxr.mozilla.org/mozilla-central/source/js/src/jsfriendapi.h#837
We're wrapping up FF28 beta today and since we lived with this in 27 already, wontfixing for 28. If there's a low risk fix ready please nominate for uplift at that time.
Attachment #8387145 - Flags: review?(wmccloskey) → review+
Let's see if this fixes it.

https://hg.mozilla.org/integration/mozilla-inbound/rev/b850bc9ae306
Whiteboard: [leave open]
Keywords: leave-open
Whiteboard: [leave open]
bholley says this fix will probably fix related bug 856670. The patch is very low risk and should be uplifted to Aurora 29 if it works.
Flags: needinfo?(nihsanullah)
See Also: → 856670
Note that the signature in bug 856670 is more common than the one here, so if the patch here makes js::RemapWrapper drop significantly, that might mean we are good for both issues.
Bobby, as Kairo said in comment #27, we are good here. Can you request an uplift (and close the bug)?
Flags: needinfo?(bobbyholley)
(In reply to Sylvestre Ledru [:sylvestre] from comment #28)
> Bobby, as Kairo said in comment #27, we are good here. Can you request an
> uplift (and close the bug)?

He never actually confirm that these crashes went away. Did they, Kairo?
Flags: needinfo?(bobbyholley) → needinfo?(kairo)
Looking at the Build ID facet tab of https://crash-stats.mozilla.com/search/?signature=~JS_TransplantObject&release_channel=nightly&date=>2014-03-01&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform it looks like the patch has helped. There was one stray crash with this signatures on a 3/12 build, but otherwise none after this patch landed.
Flags: needinfo?(kairo)
SGTM.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(jorendorff)
Resolution: --- → FIXED
Assignee: nobody → bobbyholley
Target Milestone: --- → mozilla30
Comment on attachment 8387145 [details] [diff] [review]
Check for recursion outside of the hairy transplant callsites. v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): n/a
User impact if declined: crashes
Testing completed (on m-c, etc.): baked on m-c and m-a
Risk to taking this patch (and alternatives if risky): very low risk
String or IDL/UUID changes made by this patch: none
Attachment #8387145 - Flags: approval-mozilla-beta?
Attachment #8387145 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
mo longer seeing any occurrences of this signature across dev channels.
Status: RESOLVED → VERIFIED
Keywords: verifyme
See Also: → 1053999
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: