Closed Bug 587698 Opened 9 years ago Closed 9 years ago

JM: ICs for scripted calls

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: dvander, Assigned: dvander)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 16 obsolete files)

94.32 KB, patch
dvander
: review+
Details | Diff | Splinter Review
1.73 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
11.98 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
Plan of attack: Guard on callee type, then guard on callee identity. If the identity check fails, go out to an IC stub.

The IC stub will check the type of the function.
 * Fastnative: Emit a new thunk to handle it.
   If a fastnative thunk already exists, patch it.
 * Scripted, uncompiled: compile it.
 * Scripted, uncompilable: take slow path.
 * Scripted, argc != nargs: take a slightly-less slow path, like CreateLightFrame.
 * Scripted, argc == nargs: take an even faster path, allowed by bug 586533, where some of the frame creation can be inlined.

Note that the guard is on the function object identity, not the JSFunction. If this turns out to be not-good in some benchmarks, we can have the IC generate an extra stub.

Actually inlining the frame creation will be a follow-up bug.
Err, allowed by bug 586886. 586533 gets us the ability to inline the entire frame.
Depends on: 586886
Attached patch WIP v-0 (obsolete) — Splinter Review
Sorry Brian, I lost today to IRL stuff. This is the result of some very late night hacking - I'll like to try and finish it up tonight & tomorrow.

This splits the calling ABI such that the caller is responsible for allocating space for a JSStackFrame, and initializing what's easiest on the caller side there. Members that are easiest for the callee are initialized on the callee side. To make stack limit checks fast, a call ensures that it can always push nslots + one JSStackFrame.

TODO: stack frame checks, arity mismatches, uncacheable calls, re-adding fastnative ICs.
Attached patch WIP v0 (obsolete) — Splinter Review
Check stack quota again, fixes infinite recursion.
Attachment #467334 - Attachment is obsolete: true
Attached patch WIP v1 (obsolete) — Splinter Review
Fixes: uncacheable calls now disable the IC

TODO: arity mismatches, re-adding fastnative ICs, scripted New, GC safety
Attachment #467341 - Attachment is obsolete: true
Will the fastnative ICs be using the same mechanism as the scripted IC?  I realized this afternoon that it is pretty straightforward to use the existing fastnative IC for scripted IC (bug 587707), and can make progress.
Attached patch WIP v2 (obsolete) — Splinter Review
Handles arity mismatches. For argc > nargs, it's a slow no-op (stub call). For argc < nargs, we move the JSStackFrame up and initialize undefined.

TODO: re-adding fastnative ICs, scripted New, GC safety
Attachment #467344 - Attachment is obsolete: true
Attached patch WIP v3 (obsolete) — Splinter Review
adds GC safety, scripted new support. passes trace-tests, fails 1 reftest in GC

TODO: fast-natives, ref+mochitests, cleanup
Attachment #467530 - Attachment is obsolete: true
Attached patch WIP v4 (obsolete) — Splinter Review
Re-added native call ICs, fixed remaining test failures.

This is almost ready for review, tomorrow I'll clean up and do perf analysis.
Attachment #467655 - Attachment is obsolete: true
Attached patch WIP v4.1 (obsolete) — Splinter Review
Tweak to heuristics that decide to stop patching. Will play with it more tmrw, but for now:

SS is a 3.5% win, or ~15ms in AWFY units.
Results: http://pastebin.mozilla.org/771918

v8 is a 9% win, or ~450ms in AWFY units
Results: http://pastebin.mozilla.org/771917

bhackett estimated that with his lazy frame init work we can double these gains. Exciting!
Attachment #467697 - Attachment is obsolete: true
Attached patch final patch (obsolete) — Splinter Review
Cleanup, passes tests.
Attachment #467702 - Attachment is obsolete: true
Attachment #468006 - Flags: review?(dmandelin)
Comment on attachment 468006 [details] [diff] [review]
final patch

I have mostly comment nits, but I think they are important.

It would be nice to have a block comment somewhere that explains the general strategy for call ICs. I think the division of labor between the caller's call IC and the callee code is the most important thing to understand.

For these guys:

    static const int32 NMAP_OFFSET_INVOKE = -1;
    static const int32 NMAP_OFFSET_ARITY  = -2;

it occurs to me that it would be much safer to make nmap be a C++ class with appropriate accessors. Same goes for the arrays of PICs that have counts stored in them and such. I'm not sure whether it's worthwhile to fix that right now, or if we should pick it up later. Use your judgment on that.

>diff --git a/js/src/assembler/jit/ExecutableAllocator.h b/js/src/assembler/jit/ExecutableAllocator.h
>--- a/js/src/assembler/jit/ExecutableAllocator.h
>+++ b/js/src/assembler/jit/ExecutableAllocator.h
>@@ -152,6 +152,9 @@ public:
> 
>     size_t available() const { return (m_pools.length() > 1) ? 0 : m_end - m_freePtr; }
> 
>+    /* Hack to minimize call IC size */
>+    ExecutablePool *next;

Out of curiosity, why is it necessary to reduce the call IC size? Otherwise, this seems like a fairly benign hack--we should clean it up if need to make further changes to the memory management of call IC stubs, but for now the code is very simple so I think it is OK.

>diff --git a/js/src/jsinterp.h b/js/src/jsinterp.h
>--- a/js/src/jsinterp.h
>+++ b/js/src/jsinterp.h
>@@ -75,6 +75,11 @@ enum JSFrameFlags {
>     JSFRAME_SPECIAL            = JSFRAME_DEBUGGER | JSFRAME_EVAL
> };
> 
>+namespace js { namespace mjit {
>+    class Compiler;
>+    class InlineFrameAssembler;
>+} }
>+
> /*
>  * JS stack frame, may be allocated on the C stack by native callers.  Always
>  * allocated on cx->stackPool for calls from the interpreter to an interpreted
>@@ -85,6 +90,8 @@ enum JSFrameFlags {
>  */
> struct JSStackFrame
> {
>+    friend class js::mjit::Compiler;
>+    friend class js::mjit::InlineFrameAssembler;

What are these friends for, exactly? Can they be done a different way? I have been tempted to add friend decls here in the past, but there always turned out to be another way.

>-void
>-mjit::Compiler::saveReturnAddress()
>+JSC::MacroAssembler::RegisterID
>+mjit::Compiler::getHWReturnAddress(Assembler &masm)

This has side effects, so maybe "pop" or "take" or something would be better naming.

>+static inline uint32
>+ComputeOffset(const JSC::CodeLocationCommon &end, const JSC::CodeLocationCommon &begin)
>+{
>+    return uint32(reinterpret_cast<uint8 *>(end.executableAddress()) -
>+                  reinterpret_cast<uint8 *>(begin.executableAddress()));
>+}

Some kind of "fooDiff" would probably be a more idiomatic name. Personally, I think an operator- overload would be good, but I'm not sure what our custom is on that yet.

>diff --git a/js/src/methodjit/Compiler.h b/js/src/methodjit/Compiler.h
>--- a/js/src/methodjit/Compiler.h
>+++ b/js/src/methodjit/Compiler.h
>@@ -90,11 +90,6 @@ class Compiler
>         Call call;
>         ic::MICInfo::Kind kind;
>         jsbytecode *jumpTarget;
>-        uint32 argc;
>-        uint32 frameDepth;
>-        Label knownObject;
>-        Label callEnd;
>-        JSC::MacroAssembler::RegisterID dataReg;
>         Jump traceHint;
>         MaybeJump slowTraceHint;
>         union {
>@@ -108,6 +103,31 @@ class Compiler
>             } tracer;
>         } u;
>     };
>+
>+    /* InlineFrameAssembler wants to see this. */
>+  public:
>+    struct CallGenInfo {
>+        CallGenInfo(uint32 argc)
>+          : argc(argc), constantThis(UndefinedValue())
>+        { }
>+        uint32       argc;
>+        DataLabelPtr funGuard;
>+        Jump         funJump;
>+        Call         hotCall;
>+        Call         oolCall;
>+        Label        joinPoint;
>+        Label        slowJoinPoint;
>+        Label        slowPathStart;
>+        Label        hotPathLabel;
>+        Jump         oolJump;
>+        RegisterID   funObjReg;
>+        RegisterID   funPtrReg;
>+        uint32       frameDepth;
>+        bool         isConstantThis;
>+        Value        constantThis;
>+    };

I think the above are mostly copies of the ones in the IC header? Maybe put in a comment pointing to that definition.
>diff --git a/js/src/methodjit/InlineFrameAssembler.h b/js/src/methodjit/InlineFrameAssembler.h
>+
>+#if !defined jsjaeger_inl_frame_asm_h__ && defined JS_METHODJIT && defined JS_MONOIC
>+#define jsjaeger_inl_frame_asm_h__
>+
>+#include "assembler/assembler/MacroAssembler.h"
>+#include "assembler/assembler/CodeLocation.h"
>+#include "methodjit/MethodJIT.h"
>+#include "CodeGenIncludes.h"
>+
>+namespace js {
>+namespace mjit {

// Brief comment here explaining the purpose of this class.

>+class InlineFrameAssembler {
>+    typedef JSC::MacroAssembler::RegisterID RegisterID;
>+    typedef JSC::MacroAssembler::Address Address;
>+    typedef JSC::MacroAssembler::Imm32 Imm32;
>+    typedef JSC::MacroAssembler::ImmPtr ImmPtr;

// 1-line comments saying what these are.

>+    Assembler &masm;
>+    bool       isConstantThis;
>+    Value      constantThis;
>+    uint32     frameDepth;
>+    uint32     argc;
>+    RegisterID funObjReg;
>+    jsbytecode *pc;
>+    uint32     flags;
>+
>+  public:
>+    Registers  tempRegs;

1-line comments for the members here, and the ones below that don't have one yet.

>-/* Compiler for generating fast paths for a MIC'ed native call. */
>-class NativeCallCompiler
>-{
>-    typedef JSC::MacroAssembler::Jump Jump;
>+    JSC::ExecutablePool *pools;
> 
>-    struct Patch {
>-        Patch(Jump from, uint8 *to)
>-          : from(from), to(to)
>-        { }
>+    /* Used for rooting and reification. */
>+    JSObject *fastGuardedObject;
>+    JSObject *fastGuardedNative;
>+    Value constantThis;
> 
>-        Jump from;
>-        uint8 *to;
>-    };
>+    uint32 argc : 16;
>+    uint32 frameDepth : 16;
> 
>-  public:
>-    Assembler masm;
>+    /* Function object identity guard. */
>+    JSC::CodeLocationDataLabelPtr funGuard;
> 
>-  private:
>-    /* :TODO: oom check */
>-    Vector<Patch, 8, SystemAllocPolicy> jumps;
>+    /* Starting point for all slow call paths. */
>+    JSC::CodeLocationLabel slowPathStart;
> 
>-  public:
>-    NativeCallCompiler();
>+    /* Inline to OOL jump, redirected by stubs. */
>+    JSC::CodeLocationJump funJump;
> 
>-    size_t size() { return masm.size(); }
>-    uint8 *buffer() { return masm.buffer(); }
>+    /* Offset to inline scripted call, from funGuard. */
>+    uint32 hotCallOffset   : 8;
>+    uint32 joinPointOffset : 8;
> 
>-    /* Exit from the call path to target. */
>-    void addLink(Jump j, uint8 *target) { jumps.append(Patch(j, target)); }
>+    /* Out of line slow call. */
>+    uint32 oolCallOffset   : 8;
> 
>-    /*
>-     * Finish up this native, and add an incoming jump from start
>-     * and an outgoing jump to fallthrough.
>-     */
>-    void finish(JSScript *script, uint8 *start, uint8 *fallthrough);
>+    /* Jump to patch for out-of-line scripted calls. */
>+    uint32 oolJumpOffset   : 8;
>+
>+    /* Offset for deep-fun check to rejoin at. */
>+    uint32 hotPathOffset   : 8;
>+
>+    /* Join point for all slow call paths. */
>+    uint32 slowJoinOffset  : 9;
>+
>+    RegisterID funObjReg : 5;
>+    RegisterID funPtrReg : 5;
>+    bool isConstantThis : 1;
>+    bool hit : 1;
>+    bool hasJsFunCheck : 1;
Comment on attachment 468006 [details] [diff] [review]
final patch

Oops, forgot to r+, modulo the request extra comments.
Attachment #468006 - Flags: review?(dmandelin) → review+
> it occurs to me that it would be much safer to make nmap be a C++ class with

Addressed this in part in bug 590275.

> Out of curiosity, why is it necessary to reduce the call IC size?

It's not really, I was just trying to minimize IC bloat and OOM checks. There's at most three (rarely ever more) ExecPools per call IC, and in theory there should be less pools than ICs. Actually saying this now I realize the hack is wrong if EPs are actually getting reused. Will investigate before pushing.

> What are these friends for, exactly? Can they be done a different way?

Both need to compute offsetof(JSStackFrame, member)

> This has side effects, so maybe "pop" or "take"

Good idea, I like "take"

> Some kind of "fooDiff" would probably be a more idiomatic name

Yeah, this makes sense. I'll try to do the operator overload.

Will comment as suggested.
(In reply to comment #13)
> > What are these friends for, exactly? Can they be done a different way?
> 
> Both need to compute offsetof(JSStackFrame, member)

That's exactly what I started creating friends for, too. But it turns out bhackett has been creating members like JSStackFrame::offsetCallObj. That way seems much cleaner.
Good point, will do that instead.
Pushed w/ nits and test failures fixed. Added a block-comment in MonoIC.cpp that overviews our call IC strategy. http://hg.mozilla.org/projects/jaegermonkey/rev/ebd2b956b565
Oh boy did this not stick. Backout: http://hg.mozilla.org/projects/jaegermonkey/rev/3be2436f23d9

Will investigate tomorrow.
This passes jstestbrowser at least. The problem there was that fp->fun was not getting set before the call to CheckStackQuota. The browser error reporting machinery has an assert that fp->fun == callee->getPrivate().

This is basically the previous patch rebased + 1 line of code. I don't think it needs to go through review again. But dvander should probably check the setup for CheckStackQuota to make sure I have it right. I'm gonna run Mochitest locally overnight but feel free to try it on Tinderbox tomorrow.
Attached patch Rebased again, with the 1 bugfix (obsolete) — Splinter Review
Attachment #469736 - Attachment is obsolete: true
Attached patch Patch, disables JSD mochitest (obsolete) — Splinter Review
This one almost works, but we think it has memory leaks.
Attachment #468006 - Attachment is obsolete: true
Attachment #469948 - Attachment is obsolete: true
Attachment #470083 - Attachment is obsolete: true
Attached patch fixes leaks (obsolete) — Splinter Review
Attachment #470084 - Attachment is obsolete: true
Sigh. Backed out again for tinderbox failures, even though we fixed 4 bugs since last try. Errors include:

- Win2003 build bustage -- MSVS8 apparently won't work with a local class in a header file.
- Crash on jQuery in mochitest-2
- Leaks in mochitest-5
Attached patch follow-up fix (obsolete) — Splinter Review
interdiff, fixes two bugs:

1) regs.sp was not updated, so SetValueRangeToUndefined was clobberific
2) regs.pc was not updated, so Interpret() could run off into the weeds
Attachment #470967 - Flags: review?(dmandelin)
Attachment #470967 - Flags: review?(dmandelin) → review+
http://hg.mozilla.org/projects/jaegermonkey/rev/2a8c72408e36

Still failing some tests, will fix in-place on the JM tree and land separately on TM once it's green.
Attached patch rebased, more fixes (obsolete) — Splinter Review
Rebased to slow native removal. Fixes another frame initialization bug and an underflow bug on x64. Only remaining failures appear to be leaks.
Attachment #470090 - Attachment is obsolete: true
Attachment #470967 - Attachment is obsolete: true
Found the leak. Holding the fastnative object rooted in the IC is causing the problem.
Attached patch fix (obsolete) — Splinter Review
Interdiff, resets call ICs while tracing scripts.
Attachment #471785 - Flags: review?(dmandelin)
Comment on attachment 471785 [details] [diff] [review]
fix

This approach doesn't work because it can purge ICs we're currently executing. We could walk the stack to determine if they're currently live, but that brings additional complexities that make this approach less desirable.

I'm going to investigate why rooting the native funobj is leaking.
Attachment #471785 - Attachment is obsolete: true
Attachment #471785 - Flags: review?(dmandelin)
What all leaked? We have known latent XPCOM leak bugs that JS can cause to bite.

In principle IC (C for Cache) should hold weak refs. But resetting on GC just in case is too pessimistic. How about running through the ICs between the mark and finalize phases and testing js_IsAboutToBeFinalized, and resetting the IC only if that returns true?

/be
(In reply to comment #30)

What leaked: http://pastebin.mozilla.org/781961

Thanks, I didn't know about js_IsAboutToBeFinalized. I'll try that.
Attached patch rebasedSplinter Review
Attachment #471674 - Attachment is obsolete: true
Attachment #472048 - Flags: review+
Attachment #472049 - Flags: review?(dmandelin) → review+
Attachment #472050 - Flags: review?(dmandelin) → review+
Blocks: 593882
Looks greenish, so tentatively: http://hg.mozilla.org/tracemonkey/rev/63ae1c2ece4b
Whiteboard: fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Duplicate of this bug: 551339
So you disabled all the tests for EventListenerService :(
And we're now actually failing those tests (even those without need for JSD)!
Uh... yeah.  Why exactly were content tests disabled without review from _someone_ on the content team?  :(
Since it's not in the attached patch, my best guess is that there was some kind of merge bustage.
The attached patch ( https://bugzilla.mozilla.org/attachment.cgi?id=472048&action=diff ) disables test_bug448602.html from content/events/test/Makefile.in
Sorry about that, this one's my fault. I probably wrongly assumed the purpose of the test was just to test something in JSD. I should have looked at the bug being tested, checked with you guys, and filed a bug to get it enabled once debug support was there for JM.

Is there anything I can do now to help fix things?
I think Olli has it sorted out in bug 689128.  It just took some sprinkling of JSAutoEnterCompartment pixie dust in various places to fix the various test failures that had crept in, and the patch there reenables the non-jsd parts of this test.
You need to log in before you can comment on or make changes to this bug.