Closed Bug 609222 Opened 14 years ago Closed 14 years ago

JM: Fix call mechanism and recompilation

Categories

(Core :: JavaScript Engine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dvander, Assigned: dvander)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(7 files, 4 obsolete files)

19.70 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
54.27 KB, patch
dmandelin
: review+
adrake
: review+
Details | Diff | Splinter Review
3.20 KB, patch
m_kato
: review+
Details | Diff | Splinter Review
8.38 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
85.46 KB, patch
Details | Diff | Splinter Review
4.11 KB, patch
Details | Diff | Splinter Review
4.27 KB, patch
dvander
: review+
Details | Diff | Splinter Review
Attached patch part 1: hide ABI gunk (obsolete) — Splinter Review
The mechanism that builds return-address patching tables is broken. It assumes, after making a stub call, that the current assembler label is the call's return address. That's not true at all on some platforms, so there's a really gross callLabel hack. Even on x86/x64, it's broken on many potential edge cases because ADD_CALLSITE is missing.

First patch just paves the way for cleaning up by abstracting how calls are made.
Attachment #487854 - Flags: review?(dmandelin)
Test case that fails:

setDebug(true);
x = "notset";
function main() {
  /* The JSOP_STOP in a. */
  a = { valueOf: function () { trap(main, 25, "success()"); } };
  a + "";
  x = "failure";
}
function success() { x = "success"; }

main();
assertEq(x, "success");

dvander@dvander:~/mozilla/tracemonkey/js/src$ ./Debug32/js -m trap-self.js 
Assertion failure: failed to find call site, at ../methodjit/Retcon.cpp:88
Aborted
OS: Windows 7 → Windows XP
Comment on attachment 487854 [details] [diff] [review]
part 1: hide ABI gunk

>diff --git a/js/src/methodjit/BaseAssembler.h b/js/src/methodjit/BaseAssembler.h
>--- a/js/src/methodjit/BaseAssembler.h
>+++ b/js/src/methodjit/BaseAssembler.h
>@@ -173,14 +173,22 @@ class Assembler : public ValueAssembler
>     Label startLabel;
>     Vector<CallPatch, 64, SystemAllocPolicy> callPatches;
> 
>+    // For helping with calls.
>+    RegisterID  savedRegs[TotalRegisters];
>+    uint32      argCount;
>+    uint32      saveCount;
>+    uint32      pushCount;
>+    Registers::CallConvention callConvention;
>+#ifdef DEBUG
>+    bool        callIsAligned;
>+#endif

1-line comments on each of the above. "For helping with calls" is also
somewhat ambiguous.

>+    // Save all registers in the given mask.
>+    void saveRegs(uint32 volatileMask) {
>+        // Only one use per call.
>+        JS_ASSERT(saveCount == 0);
>+        // Must save registers before pushing arguments or setting up calls.
>+        JS_ASSERT(!callIsAligned);
>+        Registers set(volatileMask);
>+        while (!set.empty()) {
>+            JS_ASSERT(saveCount < TotalRegisters);
>+
>+            RegisterID reg = set.takeAnyReg();
>+            savedRegs[saveCount++] = reg;
>+            push(reg);
>+        }
>+    }

I like this function a lot: simple job, good name, good assertion coverage
with comments explaining why the assertions are there. Also, I think that
despite its name, takeAnyReg iterates over the regs in a fixed order, which
should give a predictable order for the regs, which should make code
inspection and debugging easier.

I do notice that saveRegs and restoreRegs are not used anywhere yet. Is
that because for existing stub calls, we always sync out the volatile
registers anyway?

>+    inline uint32 alignForCall() {
>+        uint32 total = (pushCount + saveCount) * sizeof(void *);
>+
>+        // If StackAlignment is a power of two, % is just two shifts.
>+        // 16 - (x % 16) gives alignment, extra % 16 handles total == 0.
>+        return (StackAlignment - (total % StackAlignment)) % StackAlignment;
>+    }

To me, it seems a bit better to make pushCount and saveCount (or perhaps
just a single value) be an argument--less state/order dependency, but 
just about as easy to use.

>+    // Prepare the stack for a call sequence. This must be called AFTER all
>+    // volatile regs have been saved, and BEFORE pushArg() is used. The stack
>+    // is assumed to be aligned to 16-bytes plus any pushes that occured via
>+    // saveVolatileRegs().
>+    void setupABICall(Registers::CallConvention convention, uint32 generalArgs) {

The comment about the constraints is good. The name of the function and the
descriptive comment could be more specific: I think this function just 
generates the stack adjustment instruction needed to ensure alignment.
'genStackBump'? It would be nice to separate out the storing of the
callConvention as well; that seems like a separate concern.

Another note about this function is that this way, we use push instructions
for storing all args/regs, and generate a stack bump just for alignment
changes. Would it be more efficient to generate one stack bump, and then
indexed stores for the saves/args? Just curious: it doesn't seem that
important and could be tried later.

>+    // Push an argument for a call.
>+    void pushArg(RegisterID reg) {

This would be better named 'storeArg', since it doesn't necessarily push.

>+    // High-level call helper, given an optional function pointer and a
>+    // calling convention.
>+    Call callWithABI(void *fun) {

The comment needs to say a bit more about what the preconditions are and
what this does in detail. 

>+        JS_ASSERT(callIsAligned);
>+        JS_ASSERT(argCount == pushCount + Registers::numArgRegs(callConvention));
>+
>+        Call cl = call();
>+        callPatches.append(CallPatch(cl, fun));
>+
>+        // Subtract arguments that went on the stack, as well as any alignment
>+        // added in setupABICall().
>+        unsigned spadjust = pushCount * sizeof(void *);
>+#if defined(JS_CPU_X86) || defined(JS_CPU_X64)
>+        spadjust += alignForCall();
>+#endif

That's a little yucky. If you want to have this function compensate for
stack adjustments made in setupABICall, then setupABICall should store
to a variable alongside pushCount, etc. Now that I think of it, some of
that stuff could be replaced (or augmented) by a single counter of how
much we have decremented the stack pointer so far.

>+    // Restore registers after a call.
>+    void restoreRegs() {
>+        JS_ASSERT(argCount == 0);
>+
>+        while (saveCount)
>+            pop(savedRegs[--saveCount]);
>+    }

I guess it works, but I get nervous about countdown loops with unsigned
iteration variables. Up to you.

>+    Call infallibleVMCall(void *ptr, uint32 frameDepth) {
>+        setupInfallibleVMFrame(frameDepth);
>+        return wrapVMCall(ptr);
>+    Call fallibleVMCall(void *ptr, jsbytecode *pc, uint32 frameDepth) {
>+        setupFallibleVMFrame(pc, frameDepth);
>+        return wrapVMCall(ptr);
>+    }

This part is nice and simple, but needs some documentation somewhere on
what a VM call is and what fallible and infallible actually mean. I think
"VM call" means a call to one of our normal stub functions (maybe), and
fallible means "can throw", therefore must store the PC.

>+    static inline uint32 numArgRegs(CallConvention convention) {
>+#if defined(JS_CPU_X86)
>+# if defined(JS_NO_FASTCALL)
>+        return 0;
>+# else
>+        return (convention == FastCall) ? 2 : 0;
>+# endif
>+#elif defined(JS_CPU_X64)
>+# ifdef _WIN64
>+        return 4;
>+# else
>+        return 6;
>+# endif
>+#elif defined(JS_CPU_ARM)
>+        return 4;
>+#endif
>+    }
>+
>+    static inline bool regForArg(CallConvention conv, uint32 i, RegisterID *reg) {
>+#if defined(JS_CPU_X86)
>+        static const RegisterID regs[] = {
>+            JSC::X86Registers::ecx,
>+            JSC::X86Registers::edx
>+        };
>+
>+# if defined(JS_NO_FASTCALL)
>+        return false;
>+# else
>+        if (conv == NormalCall)
>+            return false;
>+# endif
>+#elif defined(JS_CPU_X64)
>+# ifdef _WIN64
>+        static const RegisterID regs[] = {
>+            JSC::X86Registers::ecx,
>+            JSC::X86Registers::edx,
>+            JSC::X86Registers::r8,
>+            JSC::X86Registers::r9
>+        };
>+# else
>+        static const RegisterID regs[] = {
>+            JSC::X86Registers::edi,
>+            JSC::X86Registers::esi,
>+            JSC::X86Registers::edx,
>+            JSC::X86Registers::ecx,
>+            JSC::X86Registers::r8,
>+            JSC::X86Registers::r9
>+        };
>+# endif
>+#elif defined(JS_CPU_ARM)
>+        static const RegisterID regs[] = {
>+            JSC::ARMRegisters::r0,
>+            JSC::ARMRegisters::r1,
>+            JSC::ARMRegisters::r2,
>+            JSC::ARMRegisters::r3
>+        };
>+#endif
>+        JS_ASSERT(numArgRegs(conv) == JS_ARRAY_LENGTH(regs));
>+        if (i > JS_ARRAY_LENGTH(regs))
>+            return false;
>+        *reg = regs[i];
>+        return true;
>+    }
>+

This part is OK, but I think it could be abstracted a bit more
cleanly. In particular, if you put the regs[] arrays outside
the function, then you can write numArgRegs as pretty much just

    return sizeof(regs) / sizeof(regs[0]);

With those ifdefs living outside of regForArg, that function
would also become shorter and easier to read.

There are a couple of issues that make it not quite so easy,
though:

  - I don't think 0-length arrays are legal C++. So I guess you
    would have to put a dummy element at the end.

  - The choice of calling convention, which applies only on x86,
    complicates things. Keeping the special case for NormalCall
    for x86 only is probably the simplest thing there.
Attachment #487854 - Flags: review?(dmandelin)
> I do notice that saveRegs and restoreRegs are not used anywhere yet. Is
> that because for existing stub calls, we always sync out the volatile
> registers anyway?

Yes, they will get used in the typed arrays patch.

> 'genStackBump'? It would be nice to separate out the storing of the
> callConvention as well; that seems like a separate concern.

genStackBump is too specific. The word "setup" makes it pretty clear that it's something you should do to begin the call sequence. You need to know the calling convention at this point because you don't know how to align the stack otherwise.

> changes. Would it be more efficient to generate one stack bump, and then
> indexed stores for the saves/args?

I'm not sure, but I figured it's okay since MSVC uses pushes. It's something we can try without much difficulty. We could change VMFrame to include the maximum call stack we'd need, that way we wouldn't need to align on any calls, and could just move arguments into the right position. The problem there is that every time we want to include a new call depth, we'd have to update every VMFrame trampoline, which sounds really painful.

> This would be better named 'storeArg', since it doesn't necessarily push.

But, it does push :) if you change the order you call it, it won't work. 

> >+        JS_ASSERT(callIsAligned);
> >+        JS_ASSERT(argCount == pushCount + Registers::numArgRegs(callConvention));
> >+
> >+        Call cl = call();
> >+        callPatches.append(CallPatch(cl, fun));
> >+
> >+        // Subtract arguments that went on the stack, as well as any alignment
> >+        // added in setupABICall().
> >+        unsigned spadjust = pushCount * sizeof(void *);
> >+#if defined(JS_CPU_X86) || defined(JS_CPU_X64)
> >+        spadjust += alignForCall();
> >+#endif
> 
> That's a little yucky. If you want to have this function compensate for
> stack adjustments made in setupABICall, then setupABICall should store
> to a variable alongside pushCount, etc. Now that I think of it, some of
> that stuff could be replaced (or augmented) by a single counter of how
> much we have decremented the stack pointer so far.

Caching the result of alignForCall() might be able to replace pushCount; saveCount must be independent, and argCount should be otherwise following the ABI would be much trickier.

> I guess it works, but I get nervous about countdown loops with unsigned
> iteration variables. Up to you.

Me too, but the decrement is in the loop body, not the conditional, so it will never drop below 0.

> This part is OK, but I think it could be abstracted a bit more
> cleanly. In particular, if you put the regs[] arrays outside
> the function, then you can write numArgRegs as pretty much just

Luke tells me it's illegal to put array initializers in class bodies. I'm not sure where else they'd go, given that they should be inline.
The above patch didn't work on x64 and it turns out reserving the stack space up-front, then using moves, solved the problem very easily. It also gets rid of a bunch of state.

So basically this one should have all of the nits, in the end :)
Attachment #487854 - Attachment is obsolete: true
Attachment #488061 - Flags: review?(dmandelin)
Comment on attachment 488061 [details] [diff] [review]
part 1, take two, call abstraction

Looks good.

One micro-nit: Please add a comment to addressOfArg so that it doesn't get mistaken for a general-purpose function: it's specifically to get the address on the caller side after the stack has been fully adjusted.
Attachment #488061 - Flags: review?(dmandelin) → review+
Attached patch part 2: remove ADD_CALLSITE (obsolete) — Splinter Review
This patch removes ADD_CALLSITE, and replaces stubcc.call/stubCall with INLINE_STUBCALL and OOL_STUBCALL.

It also removes callLabel, since ADD_CALLSITE no longer relies on a potentially-incoherent masm.label().

This will break Windows x64 and Solaris. I'll fix the former in the next patch, and that should also fix the latter.
Attachment #488091 - Flags: review?(dmandelin)
This patch also breaks untrapping from inside a running script. It relied on ADD_CALLSITE in a weird way - rather than add a callsite for every opcode (and thus, doubling the size of the callsite vector), I think we can do better by keeping a list of traps. That'll be in the next patch too.
Note that the reason inline calls were trappable before, despite not having ADD_CALLSITE, is because there would always be an ADD_CALLSITE at the *top* of the next opcode.
Combines ADD_CALLSITE and stubcalls into one new macro. Inline calls (which are jumps) need special care.
Attachment #488091 - Attachment is obsolete: true
Attachment #488104 - Attachment is obsolete: true
Attachment #488222 - Flags: review?(dmandelin)
Attachment #488104 - Flags: review?(adrake)
Attachment #488091 - Flags: review?(dmandelin)
Attachment #488091 - Flags: review?(adrake)
Comment on attachment 488222 [details] [diff] [review]
part 2.1: remove ADD_CALLSITE

Looks good. Maybe add some 1-line comments to the members of InternalCallSite .
Attachment #488222 - Flags: review?(dmandelin) → review+
This patch finishes removing callLabel, letting Windows x64 work again.
Attachment #488281 - Flags: review?(m_kato)
Finally, this makes an edge case work again: untrapping from inside the trap that is being cleared.
Attachment #488336 - Flags: review?(dmandelin)
except, attached correctly
Attachment #488336 - Attachment is obsolete: true
Attachment #488338 - Flags: review?(dmandelin)
Attachment #488336 - Flags: review?(dmandelin)
Comment on attachment 488338 [details] [diff] [review]
part 4: fix untrapping from inside the same trap

r+ with s/\t/    / on Compiler.cpp and Retcon.cpp.
Attachment #488338 - Flags: review?(dmandelin) → review+
Attachment #488281 - Flags: review?(m_kato) → review+
Comment on attachment 488222 [details] [diff] [review]
part 2.1: remove ADD_CALLSITE

Looks good to me -- if for some reason in the future known types are not deterministic or are affected by changes in debug mode it may break things such as:

>          if (top->getKnownType() <= JSVAL_TYPE_INT32)
>              return;
>          prepareStubCall(Uses(1));
> -        stubCall(stubs::Pos);
> +        INLINE_STUBCALL(stubs::Pos);
Attachment #488222 - Flags: review?(adrake) → review+
The fix of this bug breaks tracemonkey build on solaris x86 compiled with Sun Studio. The attached patch can help to pass all the test. But I still have questions on function "setupABICall". Currently "setupABICall" only used once and "stackAdjust" inside this function is fixed for all platform. But if this "stackAdjust" is not fixed some day, I think there will be some problems. First, the return address will be changed with "stackAdjust". Second, the return function like "JaegerThrowpoline" or "InjectJaegerReturn" need to be changed according "stackAdjust". 

About signed int change, you can see bug 584219 comment #18.
Attachment #490824 - Flags: review?(dvander)
http://hg.mozilla.org/mozilla-central/rev/c498f1a7eb3b
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment on attachment 490824 [details] [diff] [review]
Patch to make Sun Studio on X86 work.

Would it be better to change the int32 typedef to include "signed" on SunStudio?
Attachment #493187 - Flags: review?(dvander) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: