Closed Bug 745194 Opened 8 years ago Closed 6 years ago

[jsdbg2] Crash on Heap, trying to execute NULL

Categories

(Core :: JavaScript Engine, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: decoder, Assigned: jorendorff)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [js:p2] [jsbugmon:])

Attachments

(1 file, 1 obsolete file)

The following code crashes on mozilla-central revision 3fa30b0edd15 (options -n -m -a):


var optionNames = options().split(',');
for (var i = 0; i < optionNames.length; i++)
var optionName = optionNames[i];
options(optionName);
evaluate("\
gczeal(4);\
var debuggee = newGlobal('new-compartment');\
var dbg = Debugger(debuggee);\
try { \
        (function() { debuggee.eval(\"function f(){ hits++ } f.prototype = {}; new f;\"); })();\
} catch(exc1) {}\
debuggee.eval(\"function g() { var result = new f; g_hits++; return result; }\");\
dbg.onEnterFrame = function (frame) {\
    if (frame.constructing) {\
        savedFrame = frame;\
        assertEq(savedFrame.live, true);\
        return { return: \"pass\" };\
    }\
};\
assertEq(typeof debuggee.eval(\"g();\"), \"object\");\
");


Backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7f5fc48 in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x0000000000000000 in ?? ()


The crash was also present without the options stuff, since my driver deletes the options function. It came back into the game during minimization.
options("typeinfer");
gczeal(4);
var debuggee = newGlobal('new-compartment');
debuggee.eval("function f() {}" +
              "function g() { return new f; }");
var dbg = Debugger(debuggee);
dbg.onEnterFrame = function (frame) {
    if (frame.constructing) {
        frame.live;
        return { return: 0 };
    }
};
debuggee.eval("g()");
Methodjit code calls js::ScriptDebugPrologue. This fires the onEnterFrame hook, which returns a force-return resumption value, so js::ScriptDebugPrologue "returns" to the force-return trampoline:

  *f.returnAddressLocation() = f.cx->jaegerRuntime().forceReturnFromFastCall();
  return;

Here is that trampoline code:

    /* The JSStackFrame register may have been clobbered while returning,
       reload it. */
0x00000001012ee000:	mov    0x38(%rsp),%rbx
    /* Perform the frame epilogue. */
0x00000001012ee005:	lea    0x70(%rbx),%rcx
0x00000001012ee009:	mov    %rcx,0x20(%rsp)
0x00000001012ee00e:	mov    %rsp,%rdi
0x00000001012ee011:	mov    %rbx,0x38(%rsp)
0x00000001012ee016:	mov    $0x0,%r11
0x00000001012ee020:	mov    %r11,0x30(%rsp)
0x00000001012ee025:	mov    $0x100460510,%r11
0x00000001012ee02f:	callq  *%r11        ;; js::mjit::stubs::AnyFrameEpilogue
    /* Store any known return value */
0x00000001012ee032:	mov    0x38(%rsp),%rbx
0x00000001012ee037:	mov    $0xfff9000000000000,%rdi
0x00000001012ee041:	mov    $0x0,%rsi
0x00000001012ee04b:	testl  $0x4000,(%rbx)
0x00000001012ee051:	je     0x1012ee064      ;; (this branch is not taken)
0x00000001012ee057:	mov    0x30(%rbx),%rdi  ;; (which means HAS_RVAL is set)
0x00000001012ee05b:	mov    %r14,%rsi
0x00000001012ee05e:	and    %rdi,%rsi
0x00000001012ee061:	xor    %rsi,%rdi
    /* Return to the caller */
0x00000001012ee064:	mov    0x28(%rbx),%rax
0x00000001012ee068:	jmpq   *%rax

On the last instruction we jump to 0.

This means cx->fp()->ncode_ == NULL.
OK, here's the sequence of events.

1. Create debuggee compartment, compile some code there
2. Call chain: eval code -> g -> new f -> onEnterFrame hook
3. Just as the hook returns, from js::Interpret,
     we call js::gc::MaybeVerifyBarriers,
       which ends up calling js::PurgeJITCaches,
         which calls js::mjit::ClearAllFrames,
           which sets fp->ncode_ to NULL.
4. Because of the force return, we end up in trampoline code
   that jumps to fp->ncode_, which is now NULL.

So the specific bad interaction is calling ClearAllFrames while Debugger::onEnterFrame is on the stack.

gczeal(4) happens to cause this, but an ordinary GC would trigger it too, I think.
(In reply to Jason Orendorff [:jorendorff] from comment #3)
> So the specific bad interaction is calling ClearAllFrames while
> Debugger::onEnterFrame is on the stack.

Er, plus a forced return.
Yep, I can crash without gczeal(4).

For some reason this doesn't crash unless the script turns inference off. Not sure why.

Also I don't have any immediate idea how to fix this.


options("typeinfer");
var debuggee = newGlobal('new-compartment');
var dbg = Debugger(debuggee);
debuggee.eval("function f() {}");
dbg.onEnterFrame = function (frame) {
    if (frame.type == 'call') {
        gc();
        return { return: 0 };
    }
};
debuggee.eval("f()");
Correction: it crashes just the same with or without the options("typeinfer"); line.
Whiteboard: [js:p2]
Attached patch v1 (obsolete) — Splinter Review
bhackett suggested this hack and it worked like a charm.
Assignee: general → jorendorff
Attachment #691498 - Flags: review?(bhackett1024)
Attachment #691498 - Flags: review?(bhackett1024) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/9ec1c7f91699 - Debugger-onEnterFrame-resumption-06.js and Debugger-onEnterFrame-resumption-07.js were failing with --no-ion --no-ti and --no-ion --no-ti -a -d.
Reproduced locally.
Good grief. This hack does not work like a charm. I'm not sure why it seemed to work. Some horrible testing screw-up on my end. It ends up looking at the wrong stack frame altogether: not a subtle bug.
Attached patch v2Splinter Review
Sorry to bug you again.

*Now* it seems to work like a charm. The only difference being I'm jumping to JaegerInterpoline instead of JaegerInterpolineScripted. I know nothing about these routines, so I do need a sanity-check review (sorry).

TODO - add a quick test where the caller that we're force-returning to is native.
Attachment #691498 - Attachment is obsolete: true
Attachment #695007 - Flags: review?(bhackett1024)
Comment on attachment 695007 [details] [diff] [review]
v2

Hmm, I think that there are cases where you'll want to use JaegerInterpoline and others where JaegerInterpolineScripted is right, depending on whether f.fp() was invoked as an inline frame from jitcode (there are some docs on these in MethodJIT.cpp).

If f.fp()'s nativeReturnAddress was cleared then the GC also updated f.returnAddressLocation(), and you'd be better off just not updating f.returnAddressLocation() in that case (you can also compare f.returnAddressLocation() with JaegerInterpoline).  This would mean that the return trap would be ignored when a GC or something clears the frames and the interpreter resumes, though that could be handled by adding an entry to JaegerStatus and passing it back to the interpreter through cx->jaegerRuntime()->setLastUnfinished().

To be honest though, JM's days are numbered and after bug 805241 it is due to be removed from the codebase entirely.  So just getting this to a point where it doesn't crash would be my preference.
Attachment #695007 - Flags: review?(bhackett1024) → review+
Whiteboard: [js:p2] → [js:p2][jsbugmon:update]
Whiteboard: [js:p2][jsbugmon:update] → [js:p2] [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision b5cb88ccd907).
Whiteboard: [js:p2] [jsbugmon:update,ignore] → [js:p2] [jsbugmon:bisectfix]
Whiteboard: [js:p2] [jsbugmon:bisectfix] → [js:p2] [jsbugmon:]
JSBugMon: Fix Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   127203:420954df93b8
parent:      127202:30ad3babd330
parent:      120102:0c45e6378f1f
user:        Jan de Mooij
date:        Tue Jan 29 10:13:51 2013 +0100
summary:     Merge from mozilla-central.

Not all ancestors of this changeset have been checked.
Use bisect --extend to continue the bisection from
the common ancestor, 358dd2a32990.

This iteration took 144.079 seconds to run.

Oops! We didn't test rev 0c45e6378f1f, a parent of the blamed revision! Let's do that now.
Rev 0c45e6378f1f: Updating... Compiling... Testing... Exit status: CRASHED signal 11 (SIGSEGV) (0.107 seconds)
bad (interesting) 
As expected, the parent's label is the opposite of the blamed rev's label.
This r+'d patch from December 2012 never landed, but JM has since been removed and comment 13 says this crash is no longer reproducible.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.