Closed
Bug 487707
Opened 16 years ago
Closed 13 years ago
TM: Can't record after deep-bail
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jorendorff, Unassigned)
References
Details
Specifically, between the time a _FAIL builtin deep-bails and the time it returns (possibly much later, after much interpreter work), we can't record, because if we have GC'd, recording would free JIT code that we will later return to (history in bug 458288, bug 462042).
Two possible fixes:
1. When we deep-bail, modify the stack so that we return to different (static) code. That code would have to do something with the builtin's return value. (Suggested in bug 462042 comment 3.)
2. Just keep track of which JIT code pages might be on the stack, and don't recycle those.
Reporter | ||
Comment 1•16 years ago
|
||
If I understand Andreas correctly, we currently *do* try to record after deep-bails, which is a serious bug. That could be patched quickly, but we should fix this instead.
Flags: blocking1.9.1?
Comment 2•16 years ago
|
||
We prohibit new recording of traces using the prohibitRecording flag, but this does not prevent other opportunities to flush the cache, i.e. through global shape mismatches etc. We have to get rid of the flag.
Reporter | ||
Comment 3•16 years ago
|
||
Hey, guess what. You're right. This bug makes the stuff in bug 487134 very crashy.
Blocks: 487134
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P1
Reporter | ||
Comment 4•16 years ago
|
||
I'm going to take a shot at this. I'm doing fix #2 because it looks less like rocket science.
Reporter | ||
Comment 5•16 years ago
|
||
OK, here's what fix #2 would look like, I think:
1. Change the memory management so each page has a "pin count" and rewind() does not free pages with a nonzero pin count.
2. When emitting a _FAIL call, track the address of the generated machine code. Note all pages that can execute after a deep-bail (that is, the page containing the compiled guard and side exit code; we don't care about the LIR or the SideExit struct). Store the notes in the SideExit.
3. When we deep-bail, look at cx->bailExit and increment the pin count of each page that can execute later. Record in the interpState which pages were pinned.
4. Once we return through those pages, decrement their pin count. (We already have special code in js_LeaveTree that executes when a _FAIL builtin returns after having deep-bailed.)
Sounds like work to me. I am tempted to just hack what we've got a little further; I think I can make it safe without running any risk of satisfying Andreas...
Comment 6•16 years ago
|
||
I am not opposed to hack the current approach into submission. Its ugly but we can probably make it safe.
#5 sounds a lot more complicated to me than properly unwinding and dropping the native code from the stack frame. As long we save the address of ebp in the InterpState on entry, we should be able to inject dummy code by exchanging the return address. As added benefit, we don't have to check for builtinStatus on trace. If we get back on trace, we didn't bail ...
Reporter | ||
Comment 7•16 years ago
|
||
Filed bug 487845 for the short-term hack.
Is there any other way JIT code can be blown away, short of a full rewind()? E.g. js_TrashTree? I hope not.
Comment 8•16 years ago
|
||
Ed is working on a code cache allocator/deallocator, but everything would follow the same rules. Right now its definitively just a full flush. We can do partial flushes eventually, but essentially with the same guarantees when we don't flush.
Comment 9•16 years ago
|
||
We can actually simplify #5 a lot.
1. All we need to keep around is the side exit code. The on-trace "did we fail" check is unnecessary. We don't emit it on trace. We muck with the return address to go directly to the side exit code in case of an error.
2. We can ensure that side exit code is always on one single page, simply by doing do { asm_exit() } while (page(start) != page(stop));
3. We don't have to keep the LIR around, or the VMSideExit structure. LeaveTree already consumed/used those.
4. Since side exit code is on a single page, all we have to do is pin that page as we do the _FAIL call. We can do that with a simple write to InterpStruct with the address of the page. InterpStruct must be linked as a list in case we really side exit.
5. Right before collecting the code cache, we walk that list and protect/remove/not free those pages. They get deallocated as we unwind.
Lot of text, but shouldn't be much code. 50 lines?
Comment 10•16 years ago
|
||
To find the right place to patch, we sample ESP immediately prior to the call. Nanojit grows a method that knows how to patch this, and redirect it to given guard record/side exit X. Basically a super lightweight support for making a side exit fail from inside that call, the moment the call completes. No builtinStatus check is emitted after the call. Only this forced patching causes the guard record to be jumped to. This will need some minimal arch dependent support code.
Reporter | ||
Comment 11•16 years ago
|
||
Does the caller store the return address in a predictable location on all architectures we care about? (It does on x86, the only one I know much about. I am wondering if there are CPUs where the return address is left in a register, and the calling convention doesn't specify where the callee may store it.)
Reporter | ||
Comment 12•16 years ago
|
||
What should the LIR/nanojit API look like? Maybe a new instruction:
patchPtr = addpi state, offsetof(InterpState, returnPatchAddr)
patchable_call patchPtr, fn(arg1, arg2)
which samples SP, stashes it in *patchPtr, and calls the function (all in one fat LIR insn).
Or it could be more like this:
p = returnaddr
st state[offsetof(InterpState, returnPatchAddr)] = p
call fn(arg1, arg2)
but the return address depends on the number of arguments and calling convention; and it seems bad to allow arbitrary instructions between `returnaddr` and `call` because register spills could cause the `returnaddr` to be inaccurate.
Or we could pretend that it's a library function:
call nanojit::patchableCall(patchPtr, callinfo, fn, arg1, arg2)
where callinfo must be a constant.
Updated•16 years ago
|
Assignee: general → jorendorff
Updated•16 years ago
|
Whiteboard: [is it really P1 blocker?]
Comment 13•16 years ago
|
||
Whiteboard: [is it really P1 blocker?]
Reporter | ||
Comment 14•16 years ago
|
||
But see comment 7 and bug 487845.
Comment 15•16 years ago
|
||
We could just set a flag in CallInfo that causes the address of the return address on the stack to be written to InterpState (right before the actual call a simple move ESP -> InterpState).
Comment 16•16 years ago
|
||
(In reply to comment #14)
> But see comment 7 and bug 487845.
That bug's not a blocker -- should it be?
/be
Comment 17•16 years ago
|
||
jorendorff, wants the plan here? Want help? Maybe we can split this up further.
Comment 18•16 years ago
|
||
(In reply to comment #16)
> (In reply to comment #14)
> > But see comment 7 and bug 487845.
>
> That bug's not a blocker -- should it be?
Probably - I do note that it's fixed in tracemonkey now, though. Does that unblock progress here?
Reporter | ||
Comment 19•16 years ago
|
||
OK, here is my plan now:
1. get this working on x86/mac and post the patch
2. get help implementing the interesting bits for other cpus
I still do not think this should be a P1 blocker. If we're going to include changes that are not stability fixes--something I don't have strong feelings about--bug 487134 (tracing all natives) and bug 480192 (tracing getters and setters) seem at least as important.
Comment 20•16 years ago
|
||
If this isn't a P1 blocker, then I presume that we are no longer actually recording after a deep bail? If that's the case, then I agree -- this shouldn't block beta, and we should evaluate it against bug 480192 especially. (I don't think we need 487134 in 3.5, and it doesn't seem like as much of a win as 480192.)
Somewhat-speculatively minusing, but not intending to prejudice further discussion excessively.
Flags: blocking1.9.1+ → blocking1.9.1-
Priority: P1 → P2
Reporter | ||
Comment 21•16 years ago
|
||
Throwing this one back. Explaining to Graydon what I was trying to do made my skin crawl, plus I have a P1 blocker whose fix did not stick today...
Assignee: jorendorff → general
Comment 22•13 years ago
|
||
Obsolete with the removal of tracejit.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•