Closed Bug 1437065 Opened 6 years ago Closed 6 years ago

wasm: Specialize monomorphic calls from JS to wasm

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 11 obsolete files)

1.90 KB, patch
luke
: review+
Details | Diff | Splinter Review
10.46 KB, patch
jandem
: review+
Details | Diff | Splinter Review
8.30 KB, patch
luke
: review+
Details | Diff | Splinter Review
13.21 KB, patch
luke
: review+
Details | Diff | Splinter Review
68.59 KB, patch
jandem
: review+
luke
: review+
Details | Diff | Splinter Review
Bug 1319203 showed that v8 is still doing much better than us on monomorphic calls from JS to wasm. We should try to find if we can do something about this. In particular, we could have Ion unbox arguments for us, put the TLS data in the WasmTlsReg, and go through a very small specialized entry stub.
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Attached patch monomorphic.patch (obsolete) — Splinter Review
WIP patch.

Doesn't handle exception unwinding nor frame iteration yet, so no perf numbers yet.
Attached patch wip.patch (obsolete) — Splinter Review
Passing almost all tests (need to update the profiler strings! but no crashes). I need to write more tests to ensure that the Instance can't be GC'd while the code runs (and implement this), but this shouldn't affect performance.

Result numbers, on a 400M iterations:

A. with 1 arg (add to a constant)
  Before: 2562ms
  After: 989ms (speedup = 61%)

B. with 2 args (add them up)
  Before: 2661ms
  After: 992ms (speedup = 62%)
Attachment #8996965 - Attachment is obsolete: true
This is something I've noticed while writing this patch: when calling LookupCode, if there's a codeRange outparam provided, it's not written to null if we don't find the Code. It hid an issue in my other patch, so I changed it a bit. Also used the longer name "codeRange" instead of cr, because it felt a bit weird (it's not a common abbreviation in the code base).
Attachment #8998267 - Flags: review?(luke)
Comment on attachment 8998267 [details] [diff] [review]
1.useful-lookupcode.patch

Review of attachment 8998267 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch!
Attachment #8998267 - Flags: review?(luke) → review+
Attached patch rolledup.patch (obsolete) — Splinter Review
I've tried all day to make a test case involving a GC that would trigger a use-after-free (the callee being the freed entity), but I couldn't: null-ing the callee in the loop will cause a bailout, and the callee is always traced as it's either an argument, or a local, etc.

Christian, Gary, can you please fuzz this? Interesting test cases would involve calls to WebAssembly (and crashes). This applies cleanly on top of mozilla-inbound df05fdfe1af8.
Attachment #8998574 - Flags: feedback?(nth10sd)
Attachment #8998574 - Flags: feedback?(choller)
This is an automated crash issue comment:

Summary: Crash [@ CheckUse]
Build version: mozilla-central revision 17116905bc07+
Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize
Runtime options: --fuzzing-safe --ion-offthread-compile=off --ion-eager

Testcase:

let lfDataArr = new Uint8Array([
  0, 97, 115, 109, 1, 0, 0, 0, 1, 143,
  0, 3, 96, 2, 127, 126, 0, 96, 1, 127,
  1, 126, 96, 0, 1, 127, 3, 132, 0, 3,
  0, 1, 2, 5, 131, 0, 1, 0, 1, 6, 129,
  0, 0, 7, 155, 0, 3, 5, 115, 116, 111,
  114, 101, 0, 0, 4, 108, 111, 97, 100,
  0, 1, 8, 97, 115, 115, 101, 114, 116,
  95, 48, 0, 2, 10, 174, 0, 3, 137, 128,
  128, 128, 0, 0, 32, 0, 32, 1, 61, 1,
  0, 11, 135, 128, 128, 128, 0, 0, 32,
  0, 50, 1, 0, 11, 142, 128, 128, 128,
  0, 0, 65, 255, 255, 3, 66, 128, 128,
  126, 16, 0, 65, 1, 11, 11, 171, 0, 2,
  0, 65, 0, 11, 16, 0, 1, 2, 3, 4, 5, 6,
  7, 8, 9, 10, 11, 12, 13, 14, 15, 0,
  65, 16, 11, 16, 240, 241, 242, 243,
  244, 245, 246, 247, 248, 249, 250,
  251, 252, 253, 254, 255
]);
try {
    lfModule = new WebAssembly.Module(lfDataArr.buffer);
    processModule(lfModule, "");
} catch (lfVare) {}
processModule(lfModule, "");
function processModule(module, jscode) {
    imports = {}
    instance = new WebAssembly.Instance(module, imports);
    for (let descriptor of WebAssembly.Module.exports(module)) {
        print(instance.exports[descriptor.name]())
    }
}

Backtrace:

received signal SIGSEGV, Segmentation fault.
CheckUse (usesBalance=0x7fffffffb7fc, use=0x7ffff49723e8, producer=0x7ffff49724a0) at js/src/jit/IonAnalysis.cpp:2681
#0  CheckUse (usesBalance=0x7fffffffb7fc, use=0x7ffff49723e8, producer=0x7ffff49724a0) at js/src/jit/IonAnalysis.cpp:2681
#1  js::jit::AssertBasicGraphCoherency (graph=..., force=force@entry=false) at js/src/jit/IonAnalysis.cpp:2800
#2  0x00000000007ee800 in js::jit::IonCompile (cx=<optimized out>, cx@entry=0x7ffff5f17000, script=<optimized out>, baselineFrame=baselineFrame@entry=0x7fffffffbcc8, osrPc=osrPc@entry=0x7ffff490d64c "\343\201B\aJ\377\377\377\346\v\003", recompile=<optimized out>, optimizationLevel=<optimized out>) at js/src/jit/Ion.cpp:2137
#3  0x00000000007eef99 in js::jit::Compile (cx=cx@entry=0x7ffff5f17000, script=script@entry=..., osrFrame=osrFrame@entry=0x7fffffffbcc8, osrPc=osrPc@entry=0x7ffff490d64c "\343\201B\aJ\377\377\377\346\v\003", forceRecompile=forceRecompile@entry=false) at js/src/jit/Ion.cpp:2378
#4  0x00000000007ef6f4 in BaselineCanEnterAtBranch (pc=0x7ffff490d64c "\343\201B\aJ\377\377\377\346\v\003", osrFrame=0x7fffffffbcc8, script=..., cx=0x7ffff5f17000) at js/src/jit/Ion.cpp:2555
#5  js::jit::IonCompileScriptForBaseline (cx=<optimized out>, frame=frame@entry=0x7fffffffbcc8, pc=pc@entry=0x7ffff490d64c "\343\201B\aJ\377\377\377\346\v\003") at js/src/jit/Ion.cpp:2613
#6  0x00000000006bca32 in js::jit::DoWarmUpCounterFallbackOSR (cx=<optimized out>, frame=0x7fffffffbcc8, stub=0x7ffff5f8d7b8, infoPtr=0x7fffffffbc80) at js/src/jit/BaselineIC.cpp:145
#7  0x00001ea990c27cf6 in ?? ()
[...]
#16 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff4970850	140737296926800
rcx	0x0	0
rdx	0x7ffff4972360	140737296933728
rsi	0x7ffff4972508	140737296934152
rdi	0x7ffff4972428	140737296933928
rbp	0x7fffffffb880	140737488337024
rsp	0x7fffffffb7b0	140737488336816
r8	0x0	0
r9	0x7ffff4973400	140737296937984
r10	0x11	17
r11	0x0	0
r12	0x0	0
r13	0x7ffff49724a0	140737296934048
r14	0x7ffff4970f70	140737296928624
r15	0x7ffff49723e8	140737296933864
rip	0x79d39a <js::jit::AssertBasicGraphCoherency(js::jit::MIRGraph&, bool)+2538>
=> 0x79d39a <js::jit::AssertBasicGraphCoherency(js::jit::MIRGraph&, bool)+2538>:	cmpb   $0x4,0xf0(%rcx)
   0x79d3a1 <js::jit::AssertBasicGraphCoherency(js::jit::MIRGraph&, bool)+2545>:	je     0x79d4c9 <js::jit::AssertBasicGraphCoherency(js::jit::MIRGraph&, bool)+2841>
This is an automated crash issue comment:

Summary: Assertion failure: fallibleScope_ ([OOM] Cannot allocate a new chunk in an infallible scope.), at js/src/ds/LifoAlloc.cpp:186
Build version: mozilla-central revision 17116905bc07+
Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize
Runtime options: --fuzzing-safe --ion-eager

Testcase:

var m = new WebAssembly.Module(wasmTextToBinary(`(module
    (import "a" "b" (result f64))
    (func (export "f") ${Array(200).join("(param f64)")} (result f64) (call 0))
)`));
var f = () => i.exports.f();
var i = new WebAssembly.Instance(m, {
    a: {
        b: f
    }
});
f();

Backtrace:

received signal SIGSEGV, Segmentation fault.
0x0000000000cf0ad8 in js::LifoAlloc::newChunkWithCapacity (this=this@entry=0x7ffff4957e20, n=n@entry=144) at js/src/ds/LifoAlloc.cpp:186
#0  0x0000000000cf0ad8 in js::LifoAlloc::newChunkWithCapacity (this=this@entry=0x7ffff4957e20, n=n@entry=144) at js/src/ds/LifoAlloc.cpp:186
#1  0x0000000000cf191f in js::LifoAlloc::getOrCreateChunk (this=this@entry=0x7ffff4957e20, n=n@entry=144) at js/src/ds/LifoAlloc.cpp:267
#2  0x0000000000677c2b in js::LifoAlloc::allocImpl (n=144, this=0x7ffff4957e20) at js/src/ds/LifoAlloc.h:582
#3  js::LifoAlloc::allocInfallible (this=0x7ffff4957e20, n=n@entry=144) at js/src/ds/LifoAlloc.h:672
#4  0x000000000086a551 in js::jit::TempAllocator::allocateInfallible (bytes=144, this=<optimized out>) at js/src/jit/JitAllocPolicy.h:45
#5  js::jit::TempObject::operator new (alloc=..., nbytes=144) at js/src/jit/JitAllocPolicy.h:164
#6  js::jit::MInstruction::operator new (alloc=..., nbytes=144) at js/src/jit/MIR.h:1116
#7  js::jit::MToDouble::New<js::jit::MDefinition*&> (alloc=...) at js/src/jit/MIR.h:4230
#8  js::jit::IonBuilder::inlineWasmCall (this=this@entry=0x7ffff4983270, callInfo=..., target=target@entry=0x7ffff4daef60) at js/src/jit/MCallOptimize.cpp:3841
#9  0x000000000089a77e in js::jit::IonBuilder::inlineNativeCall (this=this@entry=0x7ffff4983270, callInfo=..., target=0x7ffff4daef60) at js/src/jit/MCallOptimize.cpp:91
#10 0x00000000007e22de in js::jit::IonBuilder::inlineSingleCall (this=this@entry=0x7ffff4983270, callInfo=..., targetArg=targetArg@entry=0x7ffff4daef60) at js/src/jit/IonBuilder.cpp:4341
#11 0x00000000007e3e98 in js::jit::IonBuilder::inlineCallsite (this=this@entry=0x7ffff4983270, targets=..., callInfo=...) at js/src/jit/IonBuilder.cpp:4403
#12 0x00000000007e4246 in js::jit::IonBuilder::jsop_call (this=this@entry=0x7ffff4983270, argc=0, constructing=<optimized out>, ignoresReturnValue=<optimized out>) at js/src/jit/IonBuilder.cpp:5441
#13 0x00000000007de00a in js::jit::IonBuilder::inspectOpcode (this=this@entry=0x7ffff4983270, op=op@entry=JSOP_CALL) at js/src/jit/IonBuilder.cpp:2079
#14 0x00000000007df758 in js::jit::IonBuilder::visitBlock (this=this@entry=0x7ffff4983270, cfgblock=cfgblock@entry=0x7ffff5f85188, mblock=<optimized out>) at js/src/jit/IonBuilder.cpp:1572
#15 0x00000000007e012d in js::jit::IonBuilder::traverseBytecode (this=this@entry=0x7ffff4983270) at js/src/jit/IonBuilder.cpp:1489
#16 0x00000000007e0ef2 in js::jit::IonBuilder::build (this=this@entry=0x7ffff4983270) at js/src/jit/IonBuilder.cpp:864
#17 0x00000000007ee3f8 in js::jit::IonCompile (cx=<optimized out>, cx@entry=0x7ffff5f17000, script=<optimized out>, baselineFrame=baselineFrame@entry=0x7ffffff518d8, osrPc=osrPc@entry=0x0, recompile=<optimized out>, optimizationLevel=<optimized out>) at js/src/jit/Ion.cpp:2096
#18 0x00000000007eef99 in js::jit::Compile (cx=cx@entry=0x7ffff5f17000, script=script@entry=..., osrFrame=osrFrame@entry=0x7ffffff518d8, osrPc=osrPc@entry=0x0, forceRecompile=forceRecompile@entry=false) at js/src/jit/Ion.cpp:2378
#19 0x00000000007ef802 in BaselineCanEnterAtEntry (frame=0x7ffffff518d8, script=..., cx=0x7ffff5f17000) at js/src/jit/Ion.cpp:2494
#20 js::jit::IonCompileScriptForBaseline (cx=<optimized out>, frame=0x7ffffff518d8, pc=<optimized out>) at js/src/jit/Ion.cpp:2616
#21 0x000009edd3ff7a42 in ?? ()
#22 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff4957e20	140737296825888
rcx	0x7ffff6c212dd	140737333301981
rdx	0x0	0
rsi	0x7ffff6ef0770	140737336248176
rdi	0x7ffff6eef540	140737336243520
rbp	0x7ffffff50d50	140737487637840
rsp	0x7ffffff50d20	140737487637792
r8	0x7ffff6ef0770	140737336248176
r9	0x7ffff7fe4780	140737354024832
r10	0x58	88
r11	0x7ffff6b977a0	140737332737952
r12	0x90	144
r13	0x7ffffff50d70	140737487637872
r14	0x7ffff4957e20	140737296825888
r15	0x7ffff4983000	140737297002496
rip	0xcf0ad8 <js::LifoAlloc::newChunkWithCapacity(unsigned long)+248>
=> 0xcf0ad8 <js::LifoAlloc::newChunkWithCapacity(unsigned long)+248>:	movl   $0x0,0x0
   0xcf0ae3 <js::LifoAlloc::newChunkWithCapacity(unsigned long)+259>:	ud2
(In reply to Christian Holler (:decoder) from comment #7)
> This is an automated crash issue comment:

This one is interesting: there was a missing ensureBallast() call, but there's something else. The test case contains a wasm function with 400 parameters. A LIR node can only contain up to 2**6 - 1 operands. I've put all the LIR operands, including stack args, inside the LIR node itself. Indeed, in codegen, we're constructing an exit frame, *then* allocate stack and put the stack args at the right locations. If there were different LIR nodes for the stack args (as done in the JIT with LStackArg and in wasm with LWasmStackArg), then the stack layout would resemble this:

| --- LWasmStackArg codegen, for each stack argument
| --- exit frame (to handle frame unwinding)
| ---> this is where the stack arguments are expected by the wasm callee to be

so we'd need to copy the stack arguments; this sounds very bad for performance. (or have a stack arg pointer (argsVP) argument passed to the wasm function, as is done by JS, but this is even a more intrusive change)

I thought maybe we could specialize the LIR node to get more than 2**6 - 1 operands, as the LPhi does, but in this case, it'd mean overriding LInstruction::getDef() et al. This wouldn't work, because the LInstruction class made these functions non-virtual *by design*.

Or, we could imagine that if there are less than 2**6-1 arguments, we set all the stack args as operands of the LIR node; if there are more, we do the stack copy.

I am also simply tempted to not inline calls with more than 2**6 - 1 operands. Luke, thoughts?
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #8)
> I am also simply tempted to not inline calls with more than 2**6 - 1
> operands. Luke, thoughts?

Yes, this option sounds easiest and should not realistically occur in hot workloads.
Flags: needinfo?(luke)
Attached patch inline.patch (obsolete) — Splinter Review
Attachment #8998260 - Attachment is obsolete: true
Attachment #8998574 - Attachment is obsolete: true
Attachment #8998574 - Flags: feedback?(nth10sd)
Attachment #8998574 - Flags: feedback?(choller)
Attached patch inline.patch (obsolete) — Splinter Review
This is almost ready; I need to re-test on ARM because one big refactoring caused a few failures (fixed since then) on x86, but the code should be almost ready.
Attachment #8999173 - Attachment is obsolete: true
Attachment #8999184 - Flags: feedback?(luke)
Attachment #8999184 - Flags: feedback?(jdemooij)
Attached patch rolledup.patch (obsolete) — Splinter Review
Another rolled up patch for fuzzing, if you have some time. This applies cleanly on top of mozilla-inbound e574b6c899c1 . Thanks!
Attachment #8999185 - Flags: feedback?(nth10sd)
Attachment #8999185 - Flags: feedback?(choller)
Comment on attachment 8999184 [details] [diff] [review]
inline.patch

Review of attachment 8999184 [details] [diff] [review]:
-----------------------------------------------------------------

Cool :) I just skimmed the non-wasm/ parts. Some comments below but overall it makes sense.

::: js/src/jit/JitFrames.cpp
@@ +1184,5 @@
>      }
>  
> +    if (frame.isExitFrameLayout<FastWasmExitFrameLayout>()) {
> +        // Nothing to do: we can't have object arguments (yet!) and the callee
> +        // is traced elsewhere.

Can we assert this? Or assert we have a wasm callee or something?

::: js/src/jit/LIR.h
@@ +678,4 @@
>      // LPhi::numOperands() may not fit in this bitfield, so we only use this
>      // field for LInstruction.
>      uint32_t nonPhiNumOperands_ : 6;
> +    static_assert((2 << (6 - 1)) - 1 == MaxNumLInstructionOperands, "packing constraints");

|(1 << 6) - 1| ?

::: js/src/jit/MCallOptimize.cpp
@@ +59,5 @@
>          return InliningStatus_NotInlined;
>      }
>  
> +    bool isWasmCall = target->isWasmOptimized();
> +    if (!isWasmCall &&

We can't use JitInfo for this because of the jitInfo_/wasmJitEntry_ union in JSFunction?

@@ +3806,5 @@
> +        return InliningStatus_NotInlined;
> +
> +    // If there are too many arguments, don't inline (we won't be able to store
> +    // the arguments in the LIR node).
> +    if (sig.args().length() > MaxNumLInstructionOperands)

It might make sense to use a more conservative limit here, considering we could add quite a lot of MIR instructions below if we have 63 arguments? For instance 10 calls with 63 arguments would result in > 630 MIR instructions.

@@ +3826,5 @@
> +        MDefinition* arg = i >= callInfo.argc() ? *undefined : callInfo.getArg(i);
> +
> +        MInstruction* conversion = nullptr;
> +        switch (sig.args()[i].code()) {
> +          case wasm::ValType::Code::I32: {

Nit: don't need the {} here and below.

@@ +3827,5 @@
> +
> +        MInstruction* conversion = nullptr;
> +        switch (sig.args()[i].code()) {
> +          case wasm::ValType::Code::I32: {
> +            conversion = MToNumberInt32::New(alloc(), arg);

I think it would make sense to at least handle the matching type case without adding unnecessary MIR instructions. So for I32, something like:

if (arg->type() == MIRType::Int32)
    conversion = arg;
else
    conversion = MToNumberInt32...
Attachment #8999184 - Flags: feedback?(jdemooij) → feedback+
(In reply to Jan de Mooij [:jandem] from comment #13)
> I think it would make sense to at least handle the matching type case
> without adding unnecessary MIR instructions. So for I32, something like:
> 
> if (arg->type() == MIRType::Int32)
>     conversion = arg;
> else
>     conversion = MToNumberInt32...

Although this might not work well with phis and type analysis - you'd have to add a TypePolicy as well.
Comment on attachment 8999184 [details] [diff] [review]
inline.patch

Review of attachment 8999184 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome patch!  I was expecting something hairier, but I think you've done an excellent job finding the right cut points.

::: js/src/jit/MCallOptimize.cpp
@@ +3827,5 @@
> +
> +        MInstruction* conversion = nullptr;
> +        switch (sig.args()[i].code()) {
> +          case wasm::ValType::Code::I32: {
> +            conversion = MToNumberInt32::New(alloc(), arg);

While I think MToNumberInt32 is semantically correct, I think we could use MTruncateToInt32() instead since semantically we're performing ToInt32():
  https://webassembly.github.io/spec/js-api/index.html#towebassemblyvalue
(It'd be nice to add some tests which repeatedly call an export passing all manner of wacky arguments to test semantic correctness.)

::: js/src/jit/MIR.cpp
@@ +6322,5 @@
> +                    const wasm::FuncExport& funcExport)
> +{
> +    wasm::ExprType retType = funcExport.funcType().ret();
> +
> +    MIRType resultType = retType.code() == wasm::ExprType::Code::Void

nit: I think the "Code::" isn't necessary?  (I didn't even know it was allowed for non-enum-classes, but apparently that's new with C++11, from local testing, which is cool.)

@@ +6342,5 @@
> +#ifdef DEBUG
> +bool
> +MIonToWasmCall::isConsistentFloat32Use(MUse* use) const
> +{
> +    return funcExport_.funcType().args()[use->index()].code() == wasm::ValType::Code::F32;

nit: ditto

::: js/src/jit/MIR.h
@@ +13991,5 @@
> +class MIonToWasmCall final :
> +    public MVariadicInstruction,
> +    public NoTypePolicy::Data
> +{
> +    WasmInstanceObject* instanceObj_;

Should this be CompilerGCPointer<WasmInstanceObject*> for asserty consistency?

::: js/src/wasm/WasmFrameIter.cpp
@@ +113,5 @@
>  {
>      Frame* prevFP = fp_;
>      fp_ = prevFP->callerFP;
> +
> +    if ((uintptr_t(fp_) & 0x1) == InlinedWasmCallBit) {

Could you add a static_assert to justify the 0x1 magic constant?  (E.g., you could just static_assert InlinedWasmCallBit == 0x1)

@@ +116,5 @@
> +
> +    if ((uintptr_t(fp_) & 0x1) == InlinedWasmCallBit) {
> +        // Has to be a fast call from Ion to wasm.
> +        MOZ_ASSERT(!LookupCode(prevFP->returnAddress, &codeRange_));
> +        MOZ_ASSERT(!codeRange_);

The CodeRange* outparam is only set when LookupCode() returns non-null, so I think the second assert should be removed and LookupCode() should be called with only the first arg.  (Also good hygiene to not have macros mutate persistent fields, even though in this case codeRange_ is set to null later.)

@@ +119,5 @@
> +        MOZ_ASSERT(!LookupCode(prevFP->returnAddress, &codeRange_));
> +        MOZ_ASSERT(!codeRange_);
> +
> +        unwoundIonCallerFP_ = (uint8_t*)(uintptr_t(fp_) & ~uintptr_t(InlinedWasmCallBit));
> +        unwoundIonFrameType_ = JitFrame_Exit;

nit: Could you add a beefy comment explaining this case and contrasting it to JitFrame_JSJitToWasm?

@@ +138,1 @@
>      MOZ_ASSERT(!(uintptr_t(fp_) & JitActivation::ExitFpWasmBit));

Hrm, ExitFpWasmBit == InlinedWasmCallBit.  I don't think there's a conflict here b/c the two cases are mutually-exclusive (phew!) based on whether the fp we're looking at is coming from Frame::callerFP or JitActivation::exitFP, but I think it would be good to syntactically represent this fact with:
 - a single global constant WasmExitOrJitEntryFpTag, subsuming InlinedWasmCallBit and JitActivation::ExitFpWasmBit
 - a beefy comment explaining the two cases

@@ +158,5 @@
>      MOZ_ASSERT(codeRange_);
>  
>      if (codeRange_->isJitEntry()) {
>          unwoundIonCallerFP_ = (uint8_t*) fp_;
> +        unwoundIonFrameType_ = JitFrame_JSJitToWasm;

nit: could you add "See JitFrame_Exit comment above" comment?

@@ +766,5 @@
>      const Code* code = LookupCode(callerPC, &callerCodeRange);
>  
> +    if (!code) {
> +        // Called via an inlined fast Ion to wasm call.
> +        MOZ_ASSERT(!callerCodeRange);

Similar to above, callerCodeRange is undefined when !code (nor do I see much value in making it defined), so I'd remove the assert.  However, couldn't you assert something about the JIT caller frame by casting callerFP to the appropriate JIT frame struct?

@@ +1108,5 @@
>  
>      if (unwoundCaller) {
>          callerFP_ = unwindState.fp;
>          callerPC_ = unwindState.pc;
> +        // If the untainted FP value is tagged and there's no code for the

nit: I'd s/untainted/original/

@@ +1181,5 @@
> +    if (!code_ && (uintptr_t(callerFP_) & 0x1) == InlinedWasmCallBit) {
> +        // The parent frame is an inlined wasm call, the tagged FP points to
> +        // the fake exit frame.
> +        MOZ_ASSERT(!codeRange_);
> +        // XXX is this ever reached?

Seems like it would (would be good to have test that hits a MOZ_CRASH() here to confirm).

::: js/src/wasm/WasmFrameIter.h
@@ +108,5 @@
>      enum class Fixed : uint32_t
>      {
>          None,            // default state, the pc is in wasm code
>          FakeInterpEntry, // slow-path entry call from C++ WasmCall()
> +        FakeIonEntry,    // fast-path entry call from Ion

I don't understand why this ExitReason needs to exist: it seems like either wasm::ProfilingFrameIter reports being in the wasm function or, if it samples while in the prologue/epilogue, it reports done()=true.

::: js/src/wasm/WasmStubs.cpp
@@ +885,5 @@
> +    }
> +
> +    // Use the wasmTlsReg as a temp, it's preserved by regalloc and
> +    // not used as an argument register for wasm.
> +    Register scratch = WasmTlsReg;

Instead of depending on this property implicitly, how about using ABINonArgReg which needs no comment?

@@ +895,5 @@
> +    masm.enterFakeExitFrameForIonToWasm(scratch, ExitFrameType::FastWasmJitEntry);
> +
> +    // Move stack arguments to their final locations.
> +    unsigned bytesNeeded = StackArgBytes(fe.funcType().args());
> +    bytesNeeded = StackDecrementForCall(WasmStackAlignment, masm.framePushed(), bytesNeeded);

The requirement for the second argument to this function is the number of bytes pushed since the stack pointer was known aligned.  That's why the WasmStubs.cpp overloads of StackDecrementForCall() pass |sizeof(wasm::Frame) + masm.framePushed()|.  Probably sizeof(jit stack frame) % WasmStackAlignment == 0, so there's no actual bug here, but I think it'd be good to add in sizeof(jit stack frame).

@@ +901,5 @@
> +        masm.reserveStack(bytesNeeded);
> +
> +    const FuncType& sig = fe.funcType();
> +
> +    ABIArgGenerator abi;

I think you can use ABIArgValTypeIter, using index() instead of 'i'.

@@ +914,5 @@
> +            firstStackArgIndex.emplace(i);
> +
> +        Address dst(StackPointer, arg.offsetFromArgBase());
> +
> +        const InlinedWasmCallStackArg& stackArg = stackArgs[i - *firstStackArgIndex];

I think this scheme implicitly relies on the invariant that, after the first ABIArg goes on the stack, all subsequent ABIArgs do.  I don't think that's true, ARM being a counter-example where int and float arg regs are filled up independently.  How about instead the caller passes in a Vector<Maybe<JitCallStackArg>> which can be simply indexed with abi.index().  If JitCallStackArg has an "invalid" state, then Maybe<> isn't needed, I suppose.  If I haven't misunderstood, it'd be good to add a test case for this.

@@ +964,5 @@
> +    masm.loadWasmPinnedRegsFromTls();
> +
> +#ifdef ENABLE_WASM_GC
> +    // XXX to generate this conditionally, we'd need to record a gcIsEnabled flag in the JitContext.
> +    SuppressGC(masm, 1, ABINonArgReg0);

IIUC, this will suppress GC in all optimized JIT->wasm calls on Nightly, so I assume the "XXX" means we have to fix this before landing.  I think JitContexts aren't the right place, since they are created locally, once in the helper thread.  I'd ask Jan what the right struct is that smuggles prefs to the compile thread is or if we can imply read cx->options().wasmGC() racily via CompileRuntime.

@@ +969,5 @@
> +#endif
> +
> +    // Actual call.
> +    const wasm::CodeTier& codeTier = inst.code().codeTier(inst.code().bestTier());
> +    uint32_t offset = codeTier.metadata().codeRanges[fe.interpCodeRangeIndex()].funcNormalEntry();

pre-existing: interpCodeRangeIndex() is the totally wrong name for this; it should be funcCodeRangeIndex() or something

@@ +1005,5 @@
> +        MOZ_CRASH("Limit");
> +    }
> +
> +    // Free args + frame descriptor.
> +    masm.freeStack(bytesNeeded + ExitFrameLayout::SizeWithFooter());

nit: perhaps masm.leaveExitFrame(bytesNeedd + ExitFrameLayout::Size()) for symmetry with the enterFakeExitFrame above?

::: js/src/wasm/WasmStubs.h
@@ +45,5 @@
> +// passed to GenerateInlinedWasmCall. Since the inlined call creates its own
> +// frame, it is its responsibility to put stack arguments to their expected
> +// locations; so the caller of GenerateInlinedWasmCall can put them anywhere.
> +
> +class InlinedWasmCallStackArg

nit: since "inlined wasm call` sounds vaguely like the wasm callee is being inlined, how about `JitCallStackArg` (w/ no "Wasm" since this is all in namespace wasm).

@@ +108,5 @@
> +// - all arguments passed on stack slot are alive as defined by a corresponding
> +//   InlinedWasmCallStackArg.
> +
> +extern void
> +GenerateInlinedWasmCall(jit::MacroAssembler& masm,

nit: similar to above, how about `GenerateDirectCallFromJit`?
Attachment #8999184 - Flags: feedback?(luke) → feedback+
Attached patch rollup.patch (obsolete) — Splinter Review
New rolled up patch for testing, applies cleanly on top of m-i 18195ca7647c, and compiles even in release mode (sigh).
Attachment #8999185 - Attachment is obsolete: true
Attachment #8999185 - Flags: feedback?(nth10sd)
Attachment #8999185 - Flags: feedback?(choller)
Attachment #9002434 - Flags: feedback?(nth10sd)
Attachment #9002434 - Flags: feedback?(choller)
Attached patch rollup.patch (obsolete) — Splinter Review
(duh, forgot the first patch; will land it very quick; sorry for the noise)
Attachment #9002434 - Attachment is obsolete: true
Attachment #9002434 - Flags: feedback?(nth10sd)
Attachment #9002434 - Flags: feedback?(choller)
Attachment #9002440 - Flags: feedback?(nth10sd)
Attachment #9002440 - Flags: feedback?(choller)
I've taken the drive-by fixes and put them into a different patch, to reduce the size of the actual interesting work.
Attachment #9002506 - Flags: review?(jdemooij)
Renames interpCodeRangeIndex to funcCodeRangeIndex everywhere.
Attachment #9002507 - Flags: review?(luke)
(In reply to Jan de Mooij [:jandem] from comment #13)
> ::: js/src/jit/MCallOptimize.cpp
> @@ +59,5 @@
> >          return InliningStatus_NotInlined;
> >      }
> >  
> > +    bool isWasmCall = target->isWasmOptimized();
> > +    if (!isWasmCall &&
> 
> We can't use JitInfo for this because of the jitInfo_/wasmJitEntry_ union in
> JSFunction?

That's right, hasJitInfo() is false for optimized wasm functions.

(In reply to Luke Wagner [:luke] from comment #15)
> ::: js/src/jit/MCallOptimize.cpp
> @@ +3827,5 @@
> > +
> > +        MInstruction* conversion = nullptr;
> > +        switch (sig.args()[i].code()) {
> > +          case wasm::ValType::Code::I32: {
> > +            conversion = MToNumberInt32::New(alloc(), arg);
> 
> While I think MToNumberInt32 is semantically correct, I think we could use
> MTruncateToInt32() instead since semantically we're performing ToInt32():
>   https://webassembly.github.io/spec/js-api/index.html#towebassemblyvalue
> (It'd be nice to add some tests which repeatedly call an export passing all
> manner of wacky arguments to test semantic correctness.)

Uh, that's right! It should speed up the case where the number isn't an exact Int32... I've already renamed these MIR nodes in the past to make their intent clearer, but it seems even the new names are still confusing...

> ::: js/src/wasm/WasmFrameIter.h
> @@ +108,5 @@
> >      enum class Fixed : uint32_t
> >      {
> >          None,            // default state, the pc is in wasm code
> >          FakeInterpEntry, // slow-path entry call from C++ WasmCall()
> > +        FakeIonEntry,    // fast-path entry call from Ion
> 
> I don't understand why this ExitReason needs to exist: it seems like either
> wasm::ProfilingFrameIter reports being in the wasm function or, if it
> samples while in the prologue/epilogue, it reports done()=true.

This was to make sure that we display *something* for the inlined call, otherwise the profiling iterators just see the wasm callee (no stub at all). I realize now that this is not a real trampoline, and the jit fake exit acts as the "link" frame here; in general jit exit frames don't appear in the JSJit profiling frame iterators, so I'll remove this exit reason.

> @@ +895,5 @@
> > +    masm.enterFakeExitFrameForIonToWasm(scratch, ExitFrameType::FastWasmJitEntry);
> > +
> > +    // Move stack arguments to their final locations.
> > +    unsigned bytesNeeded = StackArgBytes(fe.funcType().args());
> > +    bytesNeeded = StackDecrementForCall(WasmStackAlignment, masm.framePushed(), bytesNeeded);
> 
> The requirement for the second argument to this function is the number of
> bytes pushed since the stack pointer was known aligned.  That's why the
> WasmStubs.cpp overloads of StackDecrementForCall() pass |sizeof(wasm::Frame)
> + masm.framePushed()|.  Probably sizeof(jit stack frame) %
> WasmStackAlignment == 0, so there's no actual bug here, but I think it'd be
> good to add in sizeof(jit stack frame).

Correct me if I'm wrong, but since the JIT ABI is a special snowflake that gets its SP aligned *after* the call instruction, we don't need to add anything here?
(In reply to Benjamin Bouvier [:bbouvier] from comment #20)
> > > +    bytesNeeded = StackDecrementForCall(WasmStackAlignment, masm.framePushed(), bytesNeeded);
> > 
> > The requirement for the second argument to this function is the number of
> > bytes pushed since the stack pointer was known aligned.  That's why the
> > WasmStubs.cpp overloads of StackDecrementForCall() pass |sizeof(wasm::Frame)
> > + masm.framePushed()|.  Probably sizeof(jit stack frame) %
> > WasmStackAlignment == 0, so there's no actual bug here, but I think it'd be
> > good to add in sizeof(jit stack frame).
> 
> Correct me if I'm wrong, but since the JIT ABI is a special snowflake that
> gets its SP aligned *after* the call instruction, we don't need to add
> anything here?

You're right about SP being aligned after the call; that means is we need to add sizeof(jit frame pushed after return address) (sorry, I don't know what specific struct that is).
Attachment #9002507 - Flags: review?(luke) → review+
Attachment #9002506 - Flags: review?(jdemooij) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3d252d6f57ec
Have LookupCode reset the codeRange out-param if it didn't find a Code; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/0dba59a9d427
Drive-by fixes; r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/5f3c40ffdfed
Rename interpCodeRangeIndex to funcCodeRangeIndex; r=luke
Attached patch wip.patch (obsolete) — Splinter Review
updated patch, with some comments addressed. Needs more tests.
Attachment #8999184 - Attachment is obsolete: true
To make an informed decision about the maximum number of arguments after we give up direct inlining, I've made a histogram of the number of arguments per function in AngryBots (by just printf the num of signature's arguments in WebAssembly.Validate, passing the binary and filtering the result with uniq -c):

 count | num arguments
   1377 0
   4962 1
  11110 2
   6773 3
   3157 4
   1355 5
    875 6
    506 7
    290 8
    149 9
    108 10
    100 11
     52 12
     28 13
     35 14
     20 15
      7 16
     15 17
      6 18
      1 19
      5 20
      5 21
      2 22
      1 23
      3 25
      2 26
      1 27
      4 29
      1 30

From this data, I think 8 (included) is a reasonable maximum number of arguments before giving up inlining.
(In reply to Benjamin Bouvier [:bbouvier] from comment #25)
> From this data, I think 8 (included) is a reasonable maximum number of
> arguments before giving up inlining.

Agreed, I think max 8 is nice for this, also to avoid adding a lot of MIR nodes.
Attached patch 4.inline-wasm-calls.patch (obsolete) — Splinter Review
Attachment #9002828 - Attachment is obsolete: true
Comment on attachment 9002440 [details] [diff] [review]
rollup.patch

Clearing feedback? from me for now. I was away on PTO for most of last week and am still clearing backlog. Parts of this bug have already landed, so feel free to ping again if needed.
Attachment #9002440 - Flags: feedback?(nth10sd)
Comment on attachment 9002440 [details] [diff] [review]
rollup.patch

No additional crashes found :) feedback+
Attachment #9002440 - Flags: feedback?(choller) → feedback+
(In reply to Luke Wagner [:luke] from comment #15)
> ::: js/src/wasm/WasmStubs.cpp
> @@ +885,5 @@
> > +    }
> > +
> > +    // Use the wasmTlsReg as a temp, it's preserved by regalloc and
> > +    // not used as an argument register for wasm.
> > +    Register scratch = WasmTlsReg;
> 
> Instead of depending on this property implicitly, how about using
> ABINonArgReg which needs no comment?

Uhh, this is worse than what I thought: since all useful registers can be used by the LInstruction to store i32 argument values (including the FramePointer, in non-profiling mode!), we can't just select WasmTlsReg or ABINonArgReg0 (it could be use to hold an i32 argument value). We need to have the LInstruction require a temporary register that's *not* the FramePointer, but we can't specify such a constraint, so we have to require a fixed temp that's not the FramePointer.

It's a bit wasteful on x64/arm/arm64 where we could use the ScratchRegister; but trying to systemically use the ScratchRegister (on platforms that have it) results in failures on ARM, which uses it *a lot* in many masm methods. So I'll keep it simple, with the extra temporary allocation on all platforms. It should be fine; since we can't have more than 8 arguments anyway, and Ion can use almost all the registers, then it can hold all the arguments + this extra scratch all in registers (regalloc may spill before the LIns, though).

(The good news is: there was a test case I added in the patch that caught this, on x86! Although it took some time to debug and understand why some value was clobbered)
Attached patch 4-inline.patch (obsolete) — Splinter Review
Attachment #9002440 - Attachment is obsolete: true
Attachment #9003166 - Attachment is obsolete: true
Attachment #9003482 - Flags: review?(luke)
Attachment #9003482 - Flags: review?(jdemooij)
Attached patch tests.patchSplinter Review
Attachment #9003483 - Flags: review?(luke)
Comment on attachment 9003482 [details] [diff] [review]
4-inline.patch

Review of attachment 9003482 [details] [diff] [review]:
-----------------------------------------------------------------

Forgot to say: thanks for the initial feedback! Addressed all of it.

r? Jan for the non-wasm parts + GenerateDirectCallFromJit in WasmStubs.cpp (that's called by CodeGenerator)
r? Luke for the wasmy parts

::: js/src/jit/JitFrames.cpp
@@ +1172,5 @@
>  
> +    if (frame.isExitFrameLayout<DirectWasmJitCallFrameLayout>()) {
> +        // Nothing to do: we can't have object arguments (yet!) and the callee
> +        // is traced elsewhere.
> +        return;

To answer one of Jan's previous questions: there's not much we can assert here, since the callee is not on the stack :/ (We can't read the callee code address because there might be arguments on the stack between the frame stack and the callee address)
If really needed, we could push an extra word to have the callee on the stack (then we'd trace it without resorting to the pool), but it's still faster if we can avoid it.

::: js/src/jsdate.cpp
@@ +35,5 @@
>  
>  #include "builtin/String.h"
>  #include "js/Conversions.h"
>  #include "js/Date.h"
> +#include "js/LocaleSensitive.h"

(remark: had to be added b/o unified build issue)
Attached patch 4-inline.patchSplinter Review
Sorry for the noise; a try build revealed that the register allocator issue wasn't entirely fixed. We need to prevent both the scratch register *and* FramePointer (in non-profiling mode only) from being used by regalloc for argument inputs. This fixes it properly, adds beefy comments in the lowering function and strong asserts in the wasm::GenerateDirectCallFromJit function.
Attachment #9003482 - Attachment is obsolete: true
Attachment #9003482 - Flags: review?(luke)
Attachment #9003482 - Flags: review?(jdemooij)
Attachment #9003602 - Flags: review?(luke)
Attachment #9003602 - Flags: review?(jdemooij)
Comment on attachment 9003483 [details] [diff] [review]
tests.patch

Review of attachment 9003483 [details] [diff] [review]:
-----------------------------------------------------------------

Nice tests!
Attachment #9003483 - Flags: review?(luke) → review+
Comment on attachment 9003602 [details] [diff] [review]
4-inline.patch

Review of attachment 9003602 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: js/src/jit/MacroAssembler-inl.h
@@ +310,5 @@
> +{
> +    linkExitFrame(cxreg, cxreg);
> +    moveStackPtrTo(FramePointer);
> +    Push(Imm32(int32_t(type)));
> +    orPtr(Imm32(wasm::ExitOrJitEntryFPTag), FramePointer);

This function has fairly non-obvious effects on FP that actually had me a bit confused when I was reading GenerateDirectCallFromJit().  How about:
 - killing enterFakeExitFrameForIonToWasm() and having GenerateDirectCallFromJit() call enterFakeExitFrame() instead,
 - putting the moveStackPtrTo(FramePointer) right before the entireFakeExitFrame() (b/c linkExitFrame() doesn't mutate SP)
 - putting the orPtr() after the enterFakeExitFrame()

Then we can see the whole story of FP in the caller which explains how the tagged FP gets into fp->callerFP.

::: js/src/wasm/WasmFrameIter.cpp
@@ +114,5 @@
>      Frame* prevFP = fp_;
>      fp_ = prevFP->callerFP;
> +
> +    static_assert(ExitOrJitEntryFPTag == 0x1, "lowest bit, so mask is 0x1");
> +    if ((uintptr_t(fp_) & 0x1) == ExitOrJitEntryFPTag) {

Actually, given the name "Tag" (which we can take to mean "bit flag"), I think we can just write "if (uintptr_t(fp) & ExitOrJitEntryFPTag)" w/o any static_assert necessary.  We also depend on this bit-flag-ness below when clearing the bit.

@@ +122,5 @@
> +        //
> +        // |---------------------|
> +        // |      JIT CODE       |
> +        // | JIT FAKE EXIT FRAME | <-- tagged fp_
> +        // |  WASM FUNCTION BODY | <-- prevFP (already unwound)

nit here and below: s/JIT CODE/JIT FRAME/ and s/WASM FUNCTION BODY/WASM FRAME/

@@ +841,5 @@
>      code_ = LookupCode(pc, &codeRange_);
> +
> +    if (!code_) {
> +        // This is a direct call from the JIT, FP is pointing to the tagged JIT
> +        // caller's frame.

How do we hit this case?  I would assume we only have a wasm exitFP once we've taken a proper exit in which case fp is the exit frame and fp->returnAddress would be a wasm function.

@@ +909,5 @@
>  
> +    // The frame pointer might be:
> +    // - in the process of tagging/untagging when calling into the JITs;
> +    // make sure it's untagged.
> +    // - tagged by an inlined JIT caller.

nit: s/inlined JIT caller/direct JIT call/

@@ +1164,5 @@
>      } else {
>          callerFP_ = unwindState.fp->callerFP;
>          callerPC_ = unwindState.fp->returnAddress;
> +        // See comment above.
> +        if ((uintptr_t(callerFP_) & ExitOrJitEntryFPTag) && !LookupCode(callerPC_))

Do we really need the !LookupCode() test for these two cases?  It seems like the wasmExitFP() case wouldn't have the bit set in the state (only in the JitActivation::exitFP()) and thus if the bit *was* set it *must* be the direct JIT call case.

@@ +1221,5 @@
>      }
>  
>      code_ = LookupCode(callerPC_, &codeRange_);
> +
> +    if (!code_ && (uintptr_t(callerFP_) & 0x1) == ExitOrJitEntryFPTag) {

Similar to above, this could perhaps just be (uintptr_t(callerFP_) & ExitOrJitEntryFPTag).

::: js/src/wasm/WasmFrameIter.h
@@ +269,5 @@
>  
> +// Bit set as the lowest bit of a frame pointer, used in two different mutually
> +// exclusive situations:
> +// - either it's a low bit tag in a FramePointer value read from a previous
> +// frame during iteration. This indicates the previous call frame has been set

nit: Instead of "from a previous frame during iteration", how about "from the Frame::callerFP of an inner wasm frame"?  And instead of "the previous call frame" how about "the frame pointed to by the tagged pointer".

@@ +272,5 @@
> +// - either it's a low bit tag in a FramePointer value read from a previous
> +// frame during iteration. This indicates the previous call frame has been set
> +// up by a JIT caller that directly called into a wasm function's body. This is
> +// only set in transient values of the FramePointer *register*, and thus it can
> +// not appear in a JitActivation's exitFP.

The second sentence doesn't quite make sense to me since the tagged pointer is stored in the callee's Frame::callerFP.  How about instead: "This is only stored in Frame::callerFP for a wasm frame called from JIT code, and thus it can not appear in a JitActivation's exitFP".

@@ +275,5 @@
> +// only set in transient values of the FramePointer *register*, and thus it can
> +// not appear in a JitActivation's exitFP.
> +// - or it's a low bit tag in a JitActivation's exitFP value that was set
> +// before a fast call from wasm to jit code, that is used as a marker to know
> +// that what's under a wasm exit actually belongs to a wasm call.

We set the bit on all exits from wasm code (in GenerateExitPrologue()), so this bullet should be "or it's the low bit tag set when exiting wasm code in JitActivation's exitFP."

::: js/src/wasm/WasmStubs.cpp
@@ +869,5 @@
> +                                uint32_t* callOffset)
> +{
> +    MOZ_ASSERT(!IsCompilingWasm());
> +
> +    size_t framePushedAtStart(masm.framePushed());

nit: does = initialization not work?

::: js/src/wasm/WasmStubs.h
@@ +41,5 @@
>                     const FuncExport& funcExport, const Maybe<jit::ImmPtr>& callee,
>                     bool isAsmJS, HasGcTypes gcTypesEnabled, CodeRangeVector* codeRanges);
>  
> +// An argument that will end up on the stack according to the system ABI, to be
> +// passed to GenerateDirectCallFromJit. Since the inlined call creates its own

nit: s/inlined call/direct JIT call/
Attachment #9003602 - Flags: review?(luke) → review+
Comment on attachment 9003602 [details] [diff] [review]
4-inline.patch

Review of attachment 9003602 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! I didn't look at the Wasm changes in too much detail because Luke knows that code much better, but it all looks reasonable.

::: js/src/jit/CodeGenerator.cpp
@@ +13822,5 @@
> +                              scratch,
> +                              &callOffset);
> +
> +    // Add the instance object to the constant pool, so it is transferred to
> +    // the owning JitScript and so that it gets traced as long as the JitScript

Nit: s/JitScript/IonScript/ twice

::: js/src/jit/Lowering.cpp
@@ +5392,5 @@
> +    if (ins->type() == MIRType::Value)
> +        lir = allocateVariadic<LIonToWasmCallV>(ins->numOperands(), scratch, fp);
> +    else
> +        lir = allocateVariadic<LIonToWasmCall>(ins->numOperands(), scratch, fp);
> +

Uber nit: I'd remove this blank line to make the allocation + its OOM check one "unit".

::: js/src/jit/MCallOptimize.cpp
@@ +3809,5 @@
> +
> +    // If there are too many arguments, don't inline (we won't be able to store
> +    // the arguments in the LIR node).
> +    static_assert(8 <= MaxNumLInstructionOperands, "inlined arguments can all be LIR operands");
> +    if (sig.args().length() > 8)

Add a constant and use it in both the static_assert and this if-statement, to make sure they stay in sync. Something like: static constexpr MaxNumInlinedArgs = 8;

::: js/src/jit/MIR.h
@@ +14021,5 @@
> +
> +    static MIonToWasmCall* New(TempAllocator& alloc, WasmInstanceObject* instanceObj,
> +                               const wasm::FuncExport& funcExport);
> +
> +    void setArg(size_t i, MDefinition* arg) {

Nit: initArg, because initOperand should not be used for non-initializing sets

::: js/src/wasm/WasmStubs.cpp
@@ +872,5 @@
> +
> +    size_t framePushedAtStart(masm.framePushed());
> +
> +    if (profilingEnabled) {
> +        // FramePointer isn't volatile, manually preserve it.

Add something like " because it will be clobbered below." at the end of this. I was thinking "isn't the wasm callee going to restore FP before it returns anyway?"

@@ +908,5 @@
> +
> +        const JitCallStackArg& stackArg = stackArgs[iter.index()];
> +        switch (stackArg.tag()) {
> +          case JitCallStackArg::Tag::Imm32:
> +            masm.storePtr(ImmWord(stackArg.imm32()), dst);

So this stack slot is actually 8 bytes on 64-bit? storePtr makes sense then.

@@ +978,5 @@
> +    if (wasmGcEnabled)
> +        SuppressGC(masm, -1, WasmTlsReg);
> +#endif
> +
> +    masm.branchPtr(Assembler::Equal, FramePointer, Imm32(wasm::FailFP), masm.exceptionLabel());

We have tests for this exception path right? This stuff can be tricky.

::: js/src/wasm/WasmStubs.h
@@ +98,5 @@
> +    jit::FloatRegister fpu() const { MOZ_ASSERT(tag_ == Tag::FPU); return arg.fpu_; }
> +    const jit::Address& addr() const { MOZ_ASSERT(tag_ == Tag::Address); return arg.addr_; }
> +};
> +
> +using JitCallStackArgVector = Vector<JitCallStackArg, 0, SystemAllocPolicy>;

Nit: this is always stack-allocated right? Then s/0/4/ or so shouldn't hurt and will eliminate some mallocs.
Attachment #9003602 - Flags: review?(jdemooij) → review+
(In reply to Luke Wagner [:luke] from comment #36)
> @@ +841,5 @@
> >      code_ = LookupCode(pc, &codeRange_);
> > +
> > +    if (!code_) {
> > +        // This is a direct call from the JIT, FP is pointing to the tagged JIT
> > +        // caller's frame.
> 
> How do we hit this case?  I would assume we only have a wasm exitFP once
> we've taken a proper exit in which case fp is the exit frame and
> fp->returnAddress would be a wasm function.

In the trapping case, fp can be set to a wasm (function) frame, so the returnAddress (which belongs to the caller frame) may be a JIT caller; note the assertion just below checks fp->callerFP. I imagine the confusion arises from my comment being wrong: it's not FP which is pointing to the JIT caller's frame, it's FP->callerFP. Will update the text. Thanks!

> @@ +1164,5 @@
> >      } else {
> >          callerFP_ = unwindState.fp->callerFP;
> >          callerPC_ = unwindState.fp->returnAddress;
> > +        // See comment above.
> > +        if ((uintptr_t(callerFP_) & ExitOrJitEntryFPTag) && !LookupCode(callerPC_))
> 
> Do we really need the !LookupCode() test for these two cases?  It seems like
> the wasmExitFP() case wouldn't have the bit set in the state (only in the
> JitActivation::exitFP()) and thus if the bit *was* set it *must* be the
> direct JIT call case.

Very good catch. This was needed during the manufacturing of this patch, but not anymore, and I've wrapped my head around to prove it:
- the only place a tagged FP that doesn't come from a JIT call could show up is when calling SetExitFP: there's a small interval of instructions during which FP is tagged (before setting JitActivation->packedExitFP): https://searchfox.org/mozilla-central/source/js/src/wasm/WasmFrameIter.cpp#361-363
- SetExitFP is either called in an OOL path of an JIT exit call, or *after* generating the prologue in GenerateExitPrologue (called for builtin thunks, debug trap stub and the import interpreter stub).
- In all the cases, since we're not in the prologue/epilogue, it means that StartUnwinding will not unwind to the callerFP: https://searchfox.org/mozilla-central/source/js/src/wasm/WasmFrameIter.cpp#981-984
- So we end up in the "else" branch here, which does unwind to the caller FP. The caller FP is tagged if and only if the caller is the JIT.

So we're not observing the transient FP value in any cases. Phew! We can remove the LookupCode here, but we need to keep the first conditions exactly as written (in the unwoundCaller case, observe state.fp; in the not unwound case, observe callerFP_).
(In reply to Jan de Mooij [:jandem] from comment #37)

Thanks for the review!

> @@ +908,5 @@
> > +
> > +        const JitCallStackArg& stackArg = stackArgs[iter.index()];
> > +        switch (stackArg.tag()) {
> > +          case JitCallStackArg::Tag::Imm32:
> > +            masm.storePtr(ImmWord(stackArg.imm32()), dst);
> 
> So this stack slot is actually 8 bytes on 64-bit? storePtr makes sense then.

Right, the stack space is computed by ABIArgGenerator, which consumes 8 bytes for int32s.

> @@ +978,5 @@
> > +    if (wasmGcEnabled)
> > +        SuppressGC(masm, -1, WasmTlsReg);
> > +#endif
> > +
> > +    masm.branchPtr(Assembler::Equal, FramePointer, Imm32(wasm::FailFP), masm.exceptionLabel());
> 
> We have tests for this exception path right? This stuff can be tricky.

Yes! jit-tests/tests/wasm/ion-error-throw is the one that tests this. This was added for the generic jit entry, and very nicely, the code added at the time to handle wasm exceptions works here too.
And here are some newer benchmark results (all results in milliseconds, same benchmarks as the ones introduced in the generic jit entry: bug 1319203 comment 71 )

call known 0args time                : 363.04  | 276.49
call known 1arg time                 : 671.3   | 335.5
call known 2args time                : 773.52  | 372.42
call known 2args rectifying1 time    : 798.94  | 367.08
call generic 2args time              : 974.02  | 551.21
call generic 2args rectifying1 time  : 998.44  | 559.31
scripted getter time                 : 324.66  | 321.42
scripted getter rectifiying time     : 609.79  | 596.8
scripted setter time                 : 637.27  | 646.2
scripted setter rectifiying time     : 720.77  | 717.67
function.apply array time            : 1480.62 | 1480.59
function.apply array rectifying time : 1688.9  | 1681.57
function.apply args time             : 745.07  | 322.3
function.apply args rectifying time  : 773.32  | 324.69
function.call time                   : 792.84  | 372.61
function.call rectifying time        : 793.64  | 385.05

It's interesting to see that this optimization combines with other Ion optimizations:

- call known is the monomorphic case, so it's expected that it gets much better.
- call generic: this is basically doing:

    var arr = [instance.add, jsAdd]; // where instance.add is a wasm addition of 2 int32, jsAdd the same with |0 annotations
    for (var i = 0; i < ITERATIONS; i++) {
        arr[i%2](i, i+1);
    }

Now, Ion spots that the loop alternates between instance.add and jsAdd, decides to monomorphize the calls (thanks to a MFunctionDispatch), and thus to inline the wasm call, making the "generic" case very fast. (Before this patch, it was using an MCall and the LCallGeneric path, hence the name)

- Function.prototype.apply with the arguments object and Function.prototype.call can also inline a monomorphic target, it seems \o/
(forgot to say: in comment 40, left is before the patch, right after the patch)
https://hg.mozilla.org/mozilla-central/rev/79e9e6a709b0
https://hg.mozilla.org/mozilla-central/rev/b415b7af0bf6
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Depends on: 1522458
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: