Closed Bug 487707 Opened 14 years ago Closed 11 years ago

TM: Can't record after deep-bail


(Core :: JavaScript Engine, defect, P2)

Other Branch





(Reporter: jorendorff, Unassigned)



Specifically, between the time a _FAIL builtin deep-bails and the time it returns (possibly much later, after much interpreter work), we can't record, because if we have GC'd, recording would free JIT code that we will later return to (history in bug 458288, bug 462042).

Two possible fixes:

1. When we deep-bail, modify the stack so that we return to different (static) code.  That code would have to do something with the builtin's return value.  (Suggested in bug 462042 comment 3.)

2. Just keep track of which JIT code pages might be on the stack, and don't recycle those.
If I understand Andreas correctly, we currently *do* try to record after deep-bails, which is a serious bug.  That could be patched quickly, but we should fix this instead.
Flags: blocking1.9.1?
We prohibit new recording of traces using the prohibitRecording flag, but this does not prevent other opportunities to flush the cache, i.e. through global shape mismatches etc. We have to get rid of the flag.
Hey, guess what.  You're right.  This bug makes the stuff in bug 487134 very crashy.
Blocks: 487134
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P1
I'm going to take a shot at this.  I'm doing fix #2 because it looks less like rocket science.
OK, here's what fix #2 would look like, I think:

1. Change the memory management so each page has a "pin count" and rewind() does not free pages with a nonzero pin count.

2. When emitting a _FAIL call, track the address of the generated machine code.  Note all pages that can execute after a deep-bail (that is, the page containing the compiled guard and side exit code; we don't care about the LIR or the SideExit struct).  Store the notes in the SideExit.

3. When we deep-bail, look at cx->bailExit and increment the pin count of each page that can execute later.  Record in the interpState which pages were pinned.

4. Once we return through those pages, decrement their pin count.  (We already have special code in js_LeaveTree that executes when a _FAIL builtin returns after having deep-bailed.)

Sounds like work to me.  I am tempted to just hack what we've got a little further; I think I can make it safe without running any risk of satisfying Andreas...
I am not opposed to hack the current approach into submission. Its ugly but we can probably make it safe.

#5 sounds a lot more complicated to me than properly unwinding and dropping the native code from the stack frame. As long we save the address of ebp in the InterpState on entry, we should be able to inject dummy code by exchanging the return address. As added benefit, we don't have to check for builtinStatus on trace. If we get back on trace, we didn't bail ...
Filed bug 487845 for the short-term hack.

Is there any other way JIT code can be blown away, short of a full rewind()? E.g. js_TrashTree? I hope not.
Ed is working on a code cache allocator/deallocator, but everything would follow the same rules. Right now its definitively just a full flush. We can do partial flushes eventually, but essentially with the same guarantees when we don't flush.
We can actually simplify #5 a lot.

1. All we need to keep around is the side exit code. The on-trace "did we fail" check is unnecessary. We don't emit it on trace. We muck with the return address to go directly to the side exit code in case of an error.

2. We can ensure that side exit code is always on one single page, simply by doing do { asm_exit() } while (page(start) != page(stop));

3. We don't have to keep the LIR around, or the VMSideExit structure. LeaveTree already consumed/used those.

4. Since side exit code is on a single page, all we have to do is pin that page as we do the _FAIL call. We can do that with a simple write to InterpStruct with the address of the page. InterpStruct must be linked as a list in case we really side exit.

5. Right before collecting the code cache, we walk that list and protect/remove/not free those pages. They get deallocated as we unwind.

Lot of text, but shouldn't be much code. 50 lines?
To find the right place to patch, we sample ESP immediately prior to the call. Nanojit grows a method that knows how to patch this, and redirect it to given guard record/side exit X. Basically a super lightweight support for making a side exit fail from inside that call, the moment the call completes. No builtinStatus check is emitted after the call. Only this forced patching causes the guard record to be jumped to. This will need some minimal arch dependent support code.
Does the caller store the return address in a predictable location on all architectures we care about?  (It does on x86, the only one I know much about. I am wondering if there are CPUs where the return address is left in a register, and the calling convention doesn't specify where the callee may store it.)
What should the LIR/nanojit API look like?  Maybe a new instruction:

  patchPtr = addpi state, offsetof(InterpState, returnPatchAddr)
  patchable_call patchPtr, fn(arg1, arg2)

which samples SP, stashes it in *patchPtr, and calls the function (all in one fat LIR insn).

Or it could be more like this:

  p = returnaddr
  st state[offsetof(InterpState, returnPatchAddr)] = p
  call fn(arg1, arg2)

but the return address depends on the number of arguments and calling convention; and it seems bad to allow arbitrary instructions between `returnaddr` and `call` because register spills could cause the `returnaddr` to be inaccurate.

Or we could pretend that it's a library function:

  call nanojit::patchableCall(patchPtr, callinfo, fn, arg1, arg2)

where callinfo must be a constant.
Assignee: general → jorendorff
Whiteboard: [is it really P1 blocker?]
See comment 1 and comment 3.

Whiteboard: [is it really P1 blocker?]
But see comment 7 and bug 487845.
We could just set a flag in CallInfo that causes the address of the return address on the stack to be written to InterpState (right before the actual call a simple move ESP -> InterpState).
(In reply to comment #14)
> But see comment 7 and bug 487845.

That bug's not a blocker -- should it be?

jorendorff, wants the plan here? Want help? Maybe we can split this up further.
(In reply to comment #16)
> (In reply to comment #14)
> > But see comment 7 and bug 487845.
> That bug's not a blocker -- should it be?

Probably - I do note that it's fixed in tracemonkey now, though. Does that unblock progress here?
OK, here is my plan now:

1.  get this working on x86/mac and post the patch
2.  get help implementing the interesting bits for other cpus

I still do not think this should be a P1 blocker.  If we're going to include changes that are not stability fixes--something I don't have strong feelings about--bug 487134 (tracing all natives) and bug 480192 (tracing getters and setters) seem at least as important.
If this isn't a P1 blocker, then I presume that we are no longer actually recording after a deep bail?  If that's the case, then I agree -- this shouldn't block beta, and we should evaluate it against bug 480192 especially.  (I don't think we need 487134 in 3.5, and it doesn't seem like as much of a win as 480192.)

Somewhat-speculatively minusing, but not intending to prejudice further discussion excessively.
Flags: blocking1.9.1+ → blocking1.9.1-
Priority: P1 → P2
Throwing this one back.  Explaining to Graydon what I was trying to do made my skin crawl, plus I have a P1 blocker whose fix did not stick today...
Assignee: jorendorff → general
Obsolete with the removal of tracejit.
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.