Baldr: make calls between wasm and JIT code blazingly fast

RESOLVED FIXED in Firefox 60

Status

()

P3
normal
RESOLVED FIXED
2 years ago
11 months ago

People

(Reporter: luke, Assigned: bbouvier)

Tracking

Trunk
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed)

Details

Attachments

(7 attachments, 24 obsolete attachments)

14.24 KB, patch
luke
: review+
Details | Diff | Splinter Review
30.40 KB, patch
jandem
: review+
Details | Diff | Splinter Review
33.84 KB, patch
luke
: review+
Details | Diff | Splinter Review
96.54 KB, patch
luke
: review+
jandem
: feedback+
Details | Diff | Splinter Review
66.22 KB, patch
jandem
: review+
Details | Diff | Splinter Review
7.65 KB, patch
jandem
: review+
Details | Diff | Splinter Review
14.23 KB, patch
jandem
: review+
Details | Diff | Splinter Review
(Reporter)

Description

2 years ago
Currently there is a pretty-fast path from wasm->JIT.  However, the path from JIT->wasm goes through a generic C++ WasmCall function which is probably 100x slower than a JIT JS->JS call.

An enabling fix to both is allowing wasm to share a JitActivation with other JIT code and killing WasmActivation.  This involves teaching the JIT stack iterators (profiling and normal) to walk into and out of wasm code and removing various other dependencies on WasmActivation.

With that fixed, JS should be able to call wasm more efficiently than calling (non-inlined) JS, and wasm calling JS should be about the same as JS<->JS.
(Assignee)

Updated

2 years ago
Component: JavaScript Engine → JavaScript Engine: JIT
WasmActivation are not yet markable, and this is a simple ""frame"" to step over.  Would it be possible to  instantiate a WasmActivation from JITted code?
Currently putting P3 on this. Though I can imagine we want to drive this forward and want this in this release? In that case feel free to put P1. Or in next release (P2).
Priority: -- → P3
(Assignee)

Updated

2 years ago
Depends on: 1360211
(Assignee)

Comment 3

a year ago
I'm looking at this now. A few thoughts about implementation:

- we can start by using the most generic path in the JIT to handle fast calls from jit->wasm, i.e. LCallGeneric. To do this, we need a way to get the address of the fast-jit-to-wasm entry ("fast entry") from the JSFunction.
- a wasm JSFunction already has a native, which is WasmCall. It's used to know if a function is a wasm/asm.js export, in which case the JSFunction is extended and has 2 slots (one for the instance, one for the function index). With both, we could in theory look up the FuncExport that would lead to the fast entry code: but that would imply a VM call to do the binary search in the func exports table, and it would probably kill performance, so this is excluded. The VM call would handle tiering, though.
- the fast entry address could be stored as a new extended slot in the JSFunction, containing a private pointer being the fast entry address. In practice, we could be okay with this, since wasm baseline code would hopefully escape to ion code quickly. Doesn't sound too appealing though, and it wouldn't handle tiering.
- another idea I had was to use another kind of jumpTable, but used to store the addresses of the fast entry to functions (say, jsJitEntryTable). Then, an extended slot (or JSJitInfo) could store the address of the jsJitEntryTable and use that instead to find the fast entry address before calling it.

With this last idea and some other discussed with Luke, we could limit the amount of changes necessary to get from the JSFunction callee to the actual code pointer in JIT code, by:
- using JSJitInfo (or flags) to tell that a function is a wasm export
- having the JSFunction native field store the jsJitEntryTable address (= base + function index).

It would magically work because going from a JS Jit callee to the code takes two pointer dereferences (JSFunction -> JSScript -> ionOrBaselineEntry), and the wasm one would take two as well (JSFunction -> jit entry table -> fast entry).

Then, every place that knows how to handle a jit call to a regular JSFunction could also handle a fast call to wasm (so that's LCallGeneric, probably LCallKnown, and the arguments rectifier at least).

I'll try that next week. Happy to hear other ideas or comments about this one.
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
(Reporter)

Comment 4

a year ago
Overall, sounds great!

Technically, I think we might not need the jsJitEntryTable: if the fast jit-to-wasm entry of tier1 is stored in a JSFunction extended slot, the entry stub will call into a tier1 function prologue which will immediately call through Code::jumpTable() to reach the best tier.  But if we're doing two loads to get the callee *anyway*, then what you are proposing avoids "wasting" that second load by putting it to good use for tiering.
(Assignee)

Comment 5

a year ago
Posted patch folded.wip.patch (obsolete) — Splinter Review
Folded patch (to be split later).

- can handle simple calls from jit to wasm (with i32 parameters) on x64
- ~can transition from wasm to jit frames in non-profiling iteration (to be confirmed with more testing; just a single stack printed from wasm called from jit worked)

Remaining work:

- handle errors thrown by wasm
- implement OOL paths when the Value conversion to primitive fails
- consider the {int64,NaN} {parameter,return values} cases
- a lot more testing

Considering that a lot is missing, I won't throw any perf numbers in yet, but just as a teaser: things are getting *much* better.
(Assignee)

Comment 6

a year ago
This trivial patch renames "entry" in wasm to "interpEntry", for perfect symmetry with exits.
Attachment #8925090 - Flags: review?(luke)
(Reporter)

Comment 7

a year ago
Comment on attachment 8925090 [details] [diff] [review]
1.rename.patch

Review of attachment 8925090 [details] [diff] [review]:
-----------------------------------------------------------------

Oops sorry for review delay.
Attachment #8925090 - Flags: review?(luke) → review+
(Assignee)

Updated

a year ago
Depends on: 1415224
(Assignee)

Updated

a year ago
Keywords: leave-open

Comment 8

a year ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0422da76e0f6
Rename Entry -> InterpEntry in wasm code; r=luke

Comment 9

a year ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/37a03f25e750
Also rename "entry trampoline" to "slow entry trampoline" in wasm tests; r=me
(Assignee)

Updated

a year ago
Depends on: 1417555
(Assignee)

Comment 11

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
New wip! Can now handle failures in inline unboxing of ion arguments expected to be int32 wasm arguments. A lot of interesting remaining failures, many TODOs, etc.
Attachment #8925089 - Attachment is obsolete: true
(Assignee)

Comment 12

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
And a new wip patch, which now also handles asm.js functions (except for functions that return SIMD; it might have been simpler to just forbid this, considering the discussions happening in bug 1416723). This is great, because I was afraid of poor test coverage of this feature, but now we have all the asm.js tests in addition to the wasm ones.

Almost all the tests are passing on x64 (except for one correctness issue in testZOOB and mysterious crashes on SIMD.js tests, which probably require some guarding or something). Next steps: check other platforms + async profiling iteration, then benchmark perf.
Attachment #8929120 - Attachment is obsolete: true
(Assignee)

Comment 13

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
All tests passing on x64/x86 and baseline support enabled. Profiling is the next step.
Attachment #8929537 - Attachment is obsolete: true
(Assignee)

Comment 14

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
(new rebased wip patch, mainly for bookkeeping)
Attachment #8930562 - Attachment is obsolete: true
(Assignee)

Comment 15

a year ago
This is a subset of the big WIP patch that can be landed independently.

This changes the layout of JSFunction so that there's a way for wasm functions to declare they have an optimized JIT entry; this is half the work so that MacroAssembler::loadJitCodeRaw [1] also works with wasm functions (the other half is in the WIP patch).

The flat that's introduced is not being used yet in this patch, so this can be separated from the bigger chunk. A few helper functions have been introduced too (some might be unused here, but they will be in the next few patches, that I'm hoping to upload in the next few days).

With this new field for JSFunction, we also need to make sure that we don't confound jitInfo with the wasm's jit entry, so this introduces JSFunction::hasJitInfo() -> bool, which explains the size of this patch. (I've used the z-renaming technique to make sure I caught all uses of jitInfo())

[1] https://searchfox.org/mozilla-central/source/js/src/jit/MacroAssembler.cpp#1674
Attachment #8935023 - Flags: review?(jdemooij)
Comment on attachment 8935023 [details] [diff] [review]
2.jsfunc-layout.patch

Review of attachment 8935023 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: js/src/jsfun.h
@@ +57,5 @@
>          INTERPRETED      = 0x0001,  /* function has a JSScript and environment. */
>          CONSTRUCTOR      = 0x0002,  /* function that can be called as a constructor */
>          EXTENDED         = 0x0004,  /* structure is FunctionExtended */
>          BOUND_FUN        = 0x0008,  /* function was created with Function.prototype.bind. */
> +        WASM_OPTIMIZED   = 0x0010,  /* asm.js/wasm function that has a jit entry */

Bug 1320403 \o/
Attachment #8935023 - Flags: review?(jdemooij) → review+

Comment 17

a year ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/959d2774b449
Change the layout of JSFunction to support wasm optimized entries; r=jandem
(Assignee)

Comment 19

a year ago
Posted patch tests.patch (obsolete) — Splinter Review
Some basic tests.
(Assignee)

Comment 20

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
Updated wip. The main change has been to use the generic Callable prologue for the OOL arg coercion entry, as well as switching its ABI to pass the value to convert onto the stack.
Attachment #8935021 - Attachment is obsolete: true
(Assignee)

Comment 21

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
First wip with profiling mode not blowing the ARM simulator up \o/ (see jit-tests/tests/wasm/ion-error-ool.js for instance: it tests that we do take the optimized entry; it tests what the stack trace should look like, both in profiling and non-profiling mode)
Attachment #8940293 - Attachment is obsolete: true
(Assignee)

Comment 22

a year ago
Comment on attachment 8942300 [details] [diff] [review]
wip.patch

Review of attachment 8942300 [details] [diff] [review]:
-----------------------------------------------------------------

Luke, if you had time to have a first look, it'd be great! I'll be polishing before review: cleaning out code, having more tests pass (the profiling tests have started using the jit entry), enabling more ways into the optimized entry (Function.prototype.apply, getters/setters, scripted proxies (?), etc), doing more perf analysis and see if we can make the entry stub even faster, etc.

I guess there are three (four if tests are separated) big different chunks of code that can be looked at ~independently:
- the creation of the new JumpTables, mainly in WasmCode/WasmGenerator/WasmJS/WasmModule
- the enabling of the fast entry in the jit/ files (BaselineIC/CodeGenerator/IonBuilder/MIR/Lowering/MacroAssembler)
- the fast entries themselves, with frame iteration (WasmStubs/WasmFrameIter, as well as Stack.cpp with minimal changes and a lot of symmetry with the wasm->jit path \o/)

I'll split the patch this way next week.

::: js/src/wasm/WasmFrameIter.cpp
@@ +894,5 @@
> +        if (offsetInCode < codeRange->jitEntryPushedRA()) {
> +            // We haven't pushed the jit return address yet, thus the jit
> +            // frame is incomplete and we won't be able to unwind it; drop it.
> +            return false;
> +        } else if (offsetInCode < codeRange->jitEntryFPStart()) {

nit: else after return
Attachment #8942300 - Flags: feedback?(luke)
(Assignee)

Comment 23

a year ago
Comment on attachment 8942300 [details] [diff] [review]
wip.patch

Review of attachment 8942300 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling feedback now, a few things have changed since the last time I've posted this patch.
Attachment #8942300 - Flags: feedback?(luke)
(Assignee)

Comment 24

a year ago
Posted patch wip.patch (obsolete) — Splinter Review
Diff:

- more tests
- now handles all the call paths i could spot in ion and baseline: scripted getter/setter, F.p.apply with args/array, etc.
Attachment #8942300 - Attachment is obsolete: true
(Assignee)

Comment 25

a year ago
Posted patch rolled.patch (obsolete) — Splinter Review
Rolled up patch.
Attachment #8943553 - Attachment is obsolete: true
(Assignee)

Comment 26

a year ago
Posted patch rolled.patch (obsolete) — Splinter Review
Attachment #8944053 - Attachment is obsolete: true
(Assignee)

Comment 27

a year ago
Some results for compilation time and size of the new stubs. I really didn't optimize for memory but for speed, and the memory impact is quite big, so this bumps up the priority of lazy stub generation or sharing more parts per instance/process. The compilation hit is not too worrying but still important relative to the overall compilation time.

ANGRYBOTS:

- before:
GenerateStubs total size: 2113070 bytes
GenerateStubs total compile time: 8.73 ms
Total compilation time: 134.76 ms

- after:
jit entry stubs: 24.09 ms / 10600040 bytes
static stubs: 2.10 ms / 911 bytes
GenerateStubs total size: 12688574 bytes
GenerateStubs total compile time: 33.41 ms
Total compilation time: 163.88 ms

GODOT:

- before:
GenerateStubs total size: 2055006 bytes
GenerateStubs total compile time: 8.19 ms
Total compile time: 185.28 ms

- after:
jit entry stubs: 32.05 ms / 11500709 bytes
static stubs: 1.98 ms / 913 bytes
GenerateStubs total size: 13558878
GenerateStubs total compile time: 46.97
Total compile time: 202.03 ms
(Assignee)

Comment 28

a year ago
I've done simple stats on these stubs, to find out where memory was taken:

- first of all, these programs have many more exported functions than imported functions: AngryBots has 19k exported functions vs 535 imported, Godot has 19.5k exported functions vs 244 imported ones. So we'll assume this is the norm to have a lot of exported functions.
- the code size in stubs is generally taken by the arguments conversions, which test for/handle many JS::Value tags before giving up. In AngryBots, the median size of the subsection that just converts input arguments is 266 bytes (min: 0, max: 4327 bytes), with a total of ~7 MB just for these. OOL jumps (that call into the JitEntryOOL functions, which then pass to C++) take a total of ~1.3 MB. Since the overall memory overhead was ~10.6 MB for AB, most of it (~80%) is in these subsections.
- last interesting fact is that the number of unique signatures in exported functions is 234 for AngryBots and 177 for Godot.
- loading the TLS register from the JSFunction takes overall ~1 MB, and looks like the second thing that could be optimized after the arguments conversion.

Some (independent) ideas to reduce this memory size:
1. lazy stub generation, with a lazy jit entry that gets only generated if the jit entry itself is used. Since the jit entry would be used as soon as the baselinejit calls the function (i.e. after 10 calls), it might be worth to just have the jit entry generated when the interp entry itself is lazily generated.
2. have per-signature arguments conversion stubs (which could be per-process), since there are so few different unique exported function signatures (relative to the number of exported functions). Since this would introduce a new call (jit -> jit entry -> signature unboxing stub), it might kill the performance of the jit entry stub, though...
3. have the JSFunction store the TlsData pointer in a new extended private slot; this should divide the size of the TLS loading by a factor of two or three.

Now, the way to go ahead: I propose to split the current big patch into smaller patches for reviews (if not landing with the optimization *disabled*, since there's one function that controls it). Then we can work on memory improvements / lazy stub generation afterwards and enable it later.
(Assignee)

Comment 29

a year ago
Posted patch rolledup.patch (obsolete) — Splinter Review
Gary, Christian: can you please fuzz this patch? Anything with asm.js/wasm and --baseline-eager or --ion-eager are of particular interest to me.

It applies cleanly on top of mozilla-inbound 28a267774e17.
Attachment #8944055 - Attachment is obsolete: true
Attachment #8944421 - Flags: feedback?(gary)
Attachment #8944421 - Flags: feedback?(choller)
(Assignee)

Comment 30

a year ago
Posted patch 3.tests.patch (obsolete) — Splinter Review
Attachment #8940290 - Attachment is obsolete: true
Attachment #8944450 - Flags: review?(luke)
(Assignee)

Comment 31

a year ago
Posted patch 4.stubs.patch (obsolete) — Splinter Review
Attachment #8944451 - Flags: review?(luke)
(Assignee)

Comment 32

a year ago
Posted patch 5.entries.patch (obsolete) — Splinter Review
Attachment #8944452 - Flags: review?(luke)
(Assignee)

Comment 33

a year ago
Posted patch 6.usejitentry.patch (obsolete) — Splinter Review
And the final patch that binds everything together:

- make sure that JSScript:offsetOf{Raw,ArgsCheck} jit entries are at the start of the JSScript struct (it's statically asserted in the previous patch) so the assembly routine that loads code is the same for jit code (load script/load raw or argscheck) and for the wasm fast entry (load jit entry table entry/load raw or argscheck)
- use the native jit entry where we would call scripted functions
Attachment #8944453 - Flags: review?(jdemooij)
(Assignee)

Comment 34

a year ago
Posted patch rolledup.patch (obsolete) — Splinter Review
Attachment #8944421 - Attachment is obsolete: true
Attachment #8944421 - Flags: feedback?(gary)
Attachment #8944421 - Flags: feedback?(choller)
(Assignee)

Comment 35

a year ago
Comment on attachment 8944455 [details] [diff] [review]
rolledup.patch

Gary, Christian: can you please fuzz this patch? Anything with asm.js/wasm and --baseline-eager or --ion-eager are of particular interest to me.

It applies cleanly on top of mozilla-inbound 28a267774e17.
Attachment #8944455 - Flags: feedback?(gary)
Attachment #8944455 - Flags: feedback?(choller)
(Assignee)

Comment 36

a year ago
Posted patch 7.jitconverters.patch (obsolete) — Splinter Review
This (incomplete) patch reduces the total jit entry stubs size from ~10 MB to ~2 MB, by splitting out the arguments conversion from the jit entries to per-instance (atm, but they could be per-process later) stubs keyed by signature.

This also regresses perf compared to the previous patch (since there's one more call/pushing frame/popping frame/ret) by 10% on micro benchmarks, making us in the ballpark of being ~30% slower than v8 on the same micro benchmarks.

Luke, does this sound like an good initial tradeoff for landing?
Flags: needinfo?(luke)
(Reporter)

Comment 37

a year ago
Landing the jitconverters patch would make sense if the lazy-stub approach would take a while, but it seems to me like lazy-stubs might not take that long.  Since lazy-stubs will actually be a net size/compile-time improvement and it allows us to maintain the optimal JIT->wasm call performance, I'm really in favor of that if it's a practical option.

(In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> - first of all, these programs have many more exported functions than
> imported functions: AngryBots has 19k exported functions vs 535 imported,
> Godot has 19.5k exported functions vs 244 imported ones. So we'll assume
> this is the norm to have a lot of exported functions.

Yes; what is making so many exported functions is that every function used in an elem section of an imported/exported ("external") Table has to be marked as exported (since it might be called via Table.get()).
Flags: needinfo?(luke)
This is an automated crash issue comment:

Summary: Assertion failure: data.s.payload.why == why, at js/Value.h:566
Build version: mozilla-central revision 3d23e6d98a09+
Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize
Runtime options: --fuzzing-safe --ion-offthread-compile=off --ion-eager

Testcase:

evaluate(`
const libdir = "/srv/repos/mozilla-central/js/src/jit-test/lib/";
load(libdir + "asm.js");
var f = asmLink(asmCompile(USE_ASM + "function f(i) { i=i|0; if (!i) return; f((i-1)|0); f((i-1)|0); f((i-1)|0); f((i-1)|0); f((i-1)|0); } return f"));
try { evaluate(\`
var obj = new f({}, {
});
\`); } catch(exc) {}
`);

Backtrace:

received signal SIGSEGV, Segmentation fault.
#0  0x0000000000841a3c in JS::Value::isMagic (this=this@entry=0x7fffffffa950, why=why@entry=JS_UNINITIALIZED_LEXICAL) at js/Value.h:566
#1  0x0000000000865698 in JS::Value::isMagic (why=JS_UNINITIALIZED_LEXICAL, this=<optimized out>) at js/src/jit/SharedIC.cpp:2407
#2  js::WrappedPtrOperations<JS::Value, JS::Handle<JS::Value> >::isMagic (why=JS_UNINITIALIZED_LEXICAL, this=<synthetic pointer>) at js/Value.h:1312
#3  js::jit::DoTypeMonitorFallback (cx=0x7ffff5f16000, frame=0x7fffffffa988, stub=0x7ffff5fad160, value=..., res=...) at js/src/jit/SharedIC.cpp:2414
#4  0x00002e5d3698392b in ?? ()
[...]
#20 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff5fad160	140737320243552
rcx	0x7ffff6c28a2d	140737333332525
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7fffffffa7f0	140737488332784
rsp	0x7fffffffa7f0	140737488332784
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff7fe4780	140737354024832
r10	0x58	88
r11	0x7ffff6b9f750	140737332770640
r12	0x7ffff44934b8	140737291826360
r13	0x7fffffffa950	140737488333136
r14	0x7ffff5f16000	140737319624704
r15	0x7fffffffa988	140737488333192
rip	0x841a3c <JS::Value::isMagic(JSWhyMagic) const+44>
=> 0x841a3c <JS::Value::isMagic(JSWhyMagic) const+44>:	movl   $0x0,0x0
   0x841a47 <JS::Value::isMagic(JSWhyMagic) const+55>:	ud2
Actually this is the better-reduced test:


const USE_ASM = '"use asm";';
function asmCompile() {
    var f = Function.apply(null, arguments);
    return f;
}
function asmLink(f) {
    var ret = f.apply(null, Array.slice(arguments, 1));
    return ret;
}
evaluate(`
  var f = asmLink(asmCompile(USE_ASM + "function f(i) { i=i|0; if (!i) return; f((i-1)|0); f((i-1)|0); f((i-1)|0); f((i-1)|0); f((i-1)|0); } return f"));
    evaluate(\`
      var obj = new f({}, {});
    \`);
`);
(Assignee)

Updated

a year ago
Depends on: 1422043
(Reporter)

Comment 40

a year ago
Comment on attachment 8944450 [details] [diff] [review]
3.tests.patch

Review of attachment 8944450 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/lib/wasm.js
@@ +190,5 @@
> +WasmHelpers._normalizeStack = stack => {
> +    var wasmFrameTypes = [
> +        {re:/^slow entry trampoline \(in wasm\)$/,                   sub:">"},
> +        {re:/^fast entry trampoline \(in wasm\)$/,                   sub:">>"},
> +        {re:/^fast entry trampoline slow conversions \(in wasm\)$/,  sub:">>ool"},

There is superficially some ambiguity between a single fast entry and two slow entry trampolines.  What about ">" as fast entry, "!>" as slow entry and "ool>" as fast entry with slow conversion?

@@ +192,5 @@
> +        {re:/^slow entry trampoline \(in wasm\)$/,                   sub:">"},
> +        {re:/^fast entry trampoline \(in wasm\)$/,                   sub:">>"},
> +        {re:/^fast entry trampoline slow conversions \(in wasm\)$/,  sub:">>ool"},
> +        {re:/^wasm-function\[(\d+)\] \(.*\)$/,                       sub:"$1"},
> +        {re:/^(fast|slow) FFI trampoline (to native )?\(in wasm\)$/, sub:"<"},

Unrelated and not blocking this bug, it'd be good to rename "FFI" to "exit" for symmetry.

::: js/src/jit-test/tests/wasm/profiling.js
@@ +84,5 @@
>          )`,
>          this,
> +        [
> +            ["", ">", "0,>", "<,0,>", `i64.${op},0,>`, "<,0,>", "0,>", ">", ""],
> +            ["", ">>", "0,>>", "<,0,>>", `i64.${op},0,>>`, "<,0,>>", "0,>>", ">>", ""],

It's a bit of a pain to have to worry about both fast and slow entry cases; that's why the exit testing collapses them both to "<".

What about: by default fast/slow are collapsed to one ">", but there is some mode/flag/arg that asks to distinguish so that you can dig in with tests if that's what you want to do.
(Reporter)

Updated

a year ago
Depends on: 1432956
(Reporter)

Comment 41

a year ago
Comment on attachment 8944451 [details] [diff] [review]
4.stubs.patch

Review of attachment 8944451 [details] [diff] [review]:
-----------------------------------------------------------------

Looking really good; just need to take some more time to carefully read the logic in WasmStubs.cpp.  Here's comments so far, mostly nits.

::: js/src/vm/Stack.cpp
@@ +1738,5 @@
>      MOZ_ALWAYS_TRUE(wasm::StartUnwinding(state, &unwindState, &ignoredUnwound));
>  
>      void* pc = unwindState.pc;
> +
> +    // In the prologue/epilogue, FP might have been fixed up to the caller's FP,

Could you rename 'ignoredUnwound' to 'unwound' and put this whole code block inside "if (unwound)"?  This both asserts harder and makes it clear that when (I might have to do this soon b/c of a corner case in my trapping patch) StartUnwinding() is changed to not ever unwind (but rather return <TlsData*, Code*, caller fp, caller pc> tuples), this logic can go away.

::: js/src/wasm/WasmFrameIter.h
@@ +69,5 @@
>      const Code* code_;
>      const CodeRange* codeRange_;
>      unsigned lineOrBytecode_;
>      Frame* fp_;
> +    uint8_t* ionCallerFP_;

IIUC, ionCallerFP_ is set when unwoundAddressOfReturnAddress_ is set and only should be accessed when the iterator is done(), so could you rename this field unwoundIonCallerFP_ as well as the method and also assert done()?

@@ +216,5 @@
>  GenerateJitExitPrologue(jit::MacroAssembler& masm, unsigned framePushed, CallableOffsets* offsets);
>  void
>  GenerateJitExitEpilogue(jit::MacroAssembler& masm, unsigned framePushed, CallableOffsets* offsets);
>  void
> +GenerateJitEntryOolPrologue(jit::MacroAssembler& masm, unsigned framePushed,

nit: could you put a \n before this function decl?

::: js/src/wasm/WasmJS.h
@@ +146,5 @@
> +    static const unsigned INSTANCE_SLOT = 0;
> +    static const unsigned EXPORTS_OBJ_SLOT = 1;
> +    static const unsigned EXPORTS_SLOT = 2;
> +    static const unsigned SCOPES_SLOT = 3;
> +    static const unsigned INSTANCE_SCOPE_SLOT = 4;

Instead of making them all public, how about keeping them all private and having a
  static unsigned offsetOfInstanceSlot() {
    return NativeObject::getFixedSlotOffset(WasmInstanceObject::INSTANCE_SLOT);
  }
with the usual "// Read by JIT code:" comment.

::: js/src/wasm/WasmStubs.cpp
@@ +563,5 @@
> +    unsigned offset = frameSize + JitFrameLayout::offsetOfCalleeToken();
> +    masm.loadFunctionFromCalleeToken(Address(sp, offset), ScratchIonEntry);
> +
> +    // ScratchValIonEntry := callee->getExtendedSlot(WASM_INSTANCE_SLOT)
> +    //                    => ObjectValue(WasmInstanceObject)

As a possible follow-up optimization: if we kick the function-index out of the FunctionExtended object (being able to compute it on (rare) demand by looking up fun->u.wasm.jitEntry to find the CodeRange which would hold the function index), then we could store the TlsData* in the function object.  Maybe this would let us match v8's numbers?

::: js/src/wasm/WasmTypes.h
@@ +1017,5 @@
> +    uint32_t pushedRA;
> +
> +    // The offset to the first instruction for which FP is set to the jit
> +    // caller.
> +    uint32_t setFP;

Would it be possible to give WasmFrameIter.h a new GenerateJitEntry(Prologue|Epilogue) (symmetric to GenerateJitExit(Prologue|Epilogue)) that was called by GenerateJitEntry in WasmStubs.cpp so that the JIT entries and unwinding was symmetric with the other prologues/epilogues.  Specifically this would allow these two fields to be static constants used by StartUnwinding() just like the other prologues and remove the need for JitEntryOffsets.  This isn't a size savings because CodeRange has a fixed size, but I feel like the symmetry would be useful especially if we build on this in the future with other types of JitEntries (monomorphic...) or other types of prologues.
Posted file Intermittent crash testcase (obsolete) —
#0  js::gc::IsCellPointerValid (cell=<optimized out>) at /home/ubuntu/shell-cache/js-dbg-64-dm-linux-1319203-c35-d9846b17a777-b0baaec09caf/objdir-js/dist/include/js/HeapAPI.h:437
#1  JS::GCPolicy<JS::Value>::isValid (value=...) at /home/ubuntu/shell-cache/js-dbg-64-dm-linux-1319203-c35-d9846b17a777-b0baaec09caf/objdir-js/dist/include/js/Value.h:1263
#2  JS::Rooted<JS::Value>::Rooted<JSContext*, JS::Value&> (initial=..., cx=<synthetic pointer>, this=0x7ffefb539a30) at /home/ubuntu/shell-cache/js-dbg-64-dm-linux-1319203-c35-d9846b17a777-b0baaec09caf/objdir-js/dist/include/js/RootingAPI.h:950
#3  CoerceInPlace_ToNumber (rawVal=0x7ffefb539bc0) at /home/ubuntu/trees/mozilla-central/js/src/wasm/WasmBuiltins.cpp:342
#4  0x0000122541b8e575 in ?? ()
#5  0x00007fd352616000 in ?? ()
#6  0x7ff8000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()

With patch in comment 35, sitting on top of m-c rev b0baaec09caf, this testcase crashed 64-bit debug deterministic builds, but it is difficult to reproduce.

It may have been a combination of the standard flags (--ion-eager, etc.) or:

--fuzzing-safe --ion-aa=flow-insensitive --ion-sincos=off --non-writable-jitcode --no-unboxed-objects --gc-zeal=14 --no-threads --baseline-eager --ion-offthread-compile=off
Comment on attachment 8944455 [details] [diff] [review]
rolledup.patch

:bbouvier confirmed over IRC that the testcase in comment 42 is related. Clearing feedback? for now...
Attachment #8944455 - Flags: feedback?(gary)
This is an automated crash issue comment:

Summary: Assertion failure: i < numFuncs_, at js/src/wasm/WasmCode.h:475
Build version: mozilla-central revision 3d23e6d98a09+
Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize
Runtime options: --fuzzing-safe

Testcase:

var g = newGlobal();
g.parent = this;
g.eval("Debugger(parent).onExceptionUnwind = function () {};");
lfModule = new WebAssembly.Module(wasmTextToBinary(`
(module
  (export "f" $func0)
  (func $func0 (result i32)
    i32.const -1
  )
)
`));
processModule(lfModule);
function processModule(module, jscode) {
    for (let i = 0; i < 2; ++i) {
        imports = {}
        instance = new WebAssembly.Instance(module, imports);
    }
}

Backtrace:

received signal SIGSEGV, Segmentation fault.
0x0000000000e31944 in js::wasm::JumpTables::getAddressOfJitEntry (i=0, this=<optimized out>) at js/src/wasm/WasmCode.h:475
#0  0x0000000000e31944 in js::wasm::JumpTables::getAddressOfJitEntry (i=0, this=<optimized out>) at js/src/wasm/WasmCode.h:475
#1  js::wasm::Code::getAddressOfJitEntry (i=0, this=<optimized out>) at js/src/wasm/WasmCode.h:517
#2  js::WasmInstanceObject::getExportedFunction (cx=cx@entry=0x7ffff5f16000, instanceObj=instanceObj@entry=..., funcIndex=0, fun=fun@entry=...) at js/src/wasm/WasmJS.cpp:1144
#3  0x0000000000e3246e in GetFunctionExport (cx=0x7ffff5f16000, instanceObj=instanceObj@entry=..., funcImports=..., funcImports@entry=..., exp=..., val=..., val@entry=...) at js/src/wasm/WasmModule.cpp:1012
#4  0x0000000000e38718 in CreateExportObject (exports=..., globalImports=..., memoryObj=..., tableObj=..., funcImports=..., instanceObj=..., cx=0x7ffff5f16000) at js/src/wasm/WasmModule.cpp:1116
#5  js::wasm::Module::instantiate (this=this@entry=0x7ffff4020c90, cx=0x7ffff5f16000, funcImports=..., funcImports@entry=..., tableImport=..., tableImport@entry=..., memoryImport=..., memoryImport@entry=..., globalImports=..., instanceProto=..., instance=...) at js/src/wasm/WasmModule.cpp:1233
#6  0x0000000000e39baa in Instantiate (cx=cx@entry=0x7ffff5f16000, module=..., importObj=..., importObj@entry=..., instanceObj=..., instanceObj@entry=...) at js/src/wasm/WasmJS.cpp:1036
#7  0x0000000000e3a76f in js::WasmInstanceObject::construct (cx=0x7ffff5f16000, argc=<optimized out>, vp=<optimized out>) at js/src/wasm/WasmJS.cpp:1061
#8  0x0000000000576d01 in js::CallJSNative (cx=0x7ffff5f16000, native=native@entry=0xe3a5b0 <js::WasmInstanceObject::construct(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/jscntxtinlines.h:291
#9  0x0000000000576e8b in js::CallJSNativeConstructor (cx=0x7ffff5f16000, native=0xe3a5b0 <js::WasmInstanceObject::construct(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/jscntxtinlines.h:324
[...]
#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:9165
rax	0x0	0
rbx	0x7fffffffc460	140737488340064
rcx	0x7ffff6c282ad	140737333330605
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7fffffffc4b0	140737488340144
rsp	0x7fffffffc420	140737488340000
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff7fe4780	140737354024832
r10	0x58	88
r11	0x7ffff6b9e7a0	140737332766624
r12	0x7fffffffc9a0	140737488341408
r13	0x7fffffffc4e0	140737488340192
r14	0x7ffff5f85760	140737320081248
r15	0x7fffffffc444	140737488340036
rip	0xe31944 <js::WasmInstanceObject::getExportedFunction(JSContext*, JS::Handle<js::WasmInstanceObject*>, unsigned int, JS::MutableHandle<JSFunction*>)+1604>
=> 0xe31944 <js::WasmInstanceObject::getExportedFunction(JSContext*, JS::Handle<js::WasmInstanceObject*>, unsigned int, JS::MutableHandle<JSFunction*>)+1604>:	movl   $0x0,0x0
   0xe3194f <js::WasmInstanceObject::getExportedFunction(JSContext*, JS::Handle<js::WasmInstanceObject*>, unsigned int, JS::MutableHandle<JSFunction*>)+1615>:	ud2


I've seen this assertion in many quite different cases, but they all seem to involve Debugger (but not all use onExceptionUnwind).
Comment on attachment 8944455 [details] [diff] [review]
rolledup.patch

Canceling feedback until the outstanding bugs are fixed.
Attachment #8944455 - Flags: feedback?(choller)
(Assignee)

Comment 46

a year ago
Posted patch additional-fixes.patch (obsolete) — Splinter Review
Additional fixes and test cases corresponding to the fuzz bugs. (Great finds Gary and Christian, thanks!)
(Assignee)

Comment 47

a year ago
Posted patch rolledup.patch (obsolete) — Splinter Review
New rolled up, to apply on m-i 401180:c68854a72847.

Gary, Christian, if you have some time to fuzz it again please, it would be fantastic, thanks!
Attachment #8944455 - Attachment is obsolete: true
Attachment #8945281 - Attachment is obsolete: true
Attachment #8946779 - Flags: feedback?(gary)
Attachment #8946779 - Flags: feedback?(choller)
I have refreshed the patch and am now fuzzing this one.
(Assignee)

Comment 49

a year ago
Comment on attachment 8944451 [details] [diff] [review]
4.stubs.patch

(working on a new simpler strategy)
Attachment #8944451 - Attachment is obsolete: true
Attachment #8944451 - Flags: review?(luke)
Comment on attachment 8946779 [details] [diff] [review]
rolledup.patch

Couldn't find any more related failures, feedback+ :)
Attachment #8946779 - Flags: feedback?(choller) → feedback+
(Assignee)

Updated

a year ago
Attachment #8944450 - Attachment is obsolete: true
Attachment #8944450 - Flags: review?(luke)
(Assignee)

Updated

a year ago
Attachment #8944452 - Attachment is obsolete: true
Attachment #8944452 - Flags: review?(luke)
(Assignee)

Updated

a year ago
Attachment #8944453 - Attachment is obsolete: true
Attachment #8944453 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8946779 - Attachment is obsolete: true
Attachment #8946779 - Flags: feedback?(nth10sd)
(Assignee)

Updated

a year ago
Attachment #8944491 - Attachment is obsolete: true
(Assignee)

Comment 51

a year ago
Posted patch rolledup.patch (obsolete) — Splinter Review
(for bookkeeping purposes only)
(Assignee)

Comment 52

a year ago
Posted patch 3.tests.patchSplinter Review
Attachment #8949099 - Flags: review?(luke)
(Assignee)

Comment 53

a year ago
Posted patch 4.stub.patchSplinter Review
Implemented new optimizations discussed:

- two loops to do the JSvalue tag checking + unboxing afterwards (avoids the exception / oolConvert stubs)
- remove FUNC_INDEX from the extended slots to store the tlsData pointer instead. (jsfun associated bits in the next patch). Delta: -3 loads or so
- also remove storing of TlsReg on the stack. It needed to be there before, because in case of exception, we'd reload it from the stack to get the jit exception handler. Now we just repeat the TlsData load sequence, since it's very short. Delta: -1 load
Attachment #8949103 - Flags: review?(luke)
(Assignee)

Comment 54

a year ago
Attachment #8949104 - Flags: review?(jdemooij)
(Reporter)

Updated

a year ago
Attachment #8949099 - Flags: review?(luke) → review+
(Reporter)

Comment 55

a year ago
Comment on attachment 8949103 [details] [diff] [review]
4.stub.patch

Review of attachment 8949103 [details] [diff] [review]:
-----------------------------------------------------------------

Really fantastic patch!  Very clean and well written.  I'm so happy to have this, our oldest biggest perf sore spot, get fixed.

These days I'm not much of an expert on our value format; I'd really feel better if Jan could review just the masm sequence in the two unboxing loops in GenerateJitEntry() in WasmStubs.cpp for correctness and to see if there's any easy perf wins.

::: js/src/jit/arm/Assembler-arm.h
@@ +121,5 @@
>  static constexpr Register ABINonArgReg2 = r6;
>  
> +// This register may be volatile or nonvolatile. Avoid d15 which is the
> +// ScratchDoubleReg.
> +static constexpr FloatRegister ABINonArgFloat32Reg { FloatRegisters::d8, VFPRegister::Single };

I think AABINonArgFloat32Reg is dead now?

::: js/src/wasm/WasmBuiltins.cpp
@@ +352,5 @@
> +    JitActivation* activation = CallingActivation();
> +    JSContext* cx = activation->cx();
> +
> +    const Code& code = activation->wasmExitFP()->tls->instance->code();
> +    const FuncExport& fe = code.metadata(code.stableTier()).lookupFuncExport(funcIndex);

I think it'd be good to optimize this stub a bit since it could theoretically show up on a hot path.  What if the stub took:
 - the TlsData* (which I think is close at hand)
 - the function-export-index (which is a constant in the stub)

@@ +359,5 @@
> +    if (!args.reserve(fe.sig().args().length()))
> +        return false;
> +
> +    for (size_t i = 0; i < fe.sig().args().length(); i++)
> +        args.infallibleAppend(*(argv + i));

If 'argv' points to the arguments array, which we already assumed elsewhere is traced by the GC, then I don't think we need to copy them into an AutoValueVector.  To get a handle, you can use HandleValue::fromMarkedLocation to forcibly convert a Value* into a HandleValue.

::: js/src/wasm/WasmCode.cpp
@@ +724,5 @@
> +{
> +    // Note a fast jit entry has two addresses, to be compatible with
> +    // ion/baseline functions which have the raw vs checked args entries,
> +    // both used all over the place in jit calls. This allows the fast entries
> +    // to be compatible with jit code pointer loading routines.

Could you add a sentence explaining why the skip-arg-check entry is always identical to the normal jit entry?

::: js/src/wasm/WasmCode.h
@@ +457,5 @@
> +{
> +    using TablePointer = mozilla::UniquePtr<void*[], JS::FreePolicy>;
> +
> +    CompileMode mode_;
> +    TablePointer wasm_;

Instead of calling this the "wasm" jump table (it's all "wasm"), how about calling this tiering_?  (That also explains why it's null when there's no tiering.)  Could you propagate this name change, so, e.g., setWasmEntry -> setTieringEntry, wasmJumpTable -> tieringJumpTable, etc.

@@ +467,5 @@
> +  public:
> +    bool init(CompileMode mode, const CodeSegment& cs, const CodeRangeVector& codeRanges);
> +
> +    void setJitEntry(size_t i, void* target) const {
> +        // See comment in wasm::Module::finishTier2.

I was also interested to know why the same 'target' was used for both.  I asked for a comment in JumpTables::init, so perhaps here you can say "See comments in Module::finishTier2 and JumpTables::init"

@@ +486,5 @@
> +            wasm_.get()[i] = target;
> +    }
> +    void** wasm() const {
> +        return wasm_.get();
> +    }

Could you add an addSizeOfMisc() and call it from Code::addSizeOfMiscIfUnseen()?

::: js/src/wasm/WasmFrameIter.cpp
@@ +899,5 @@
> +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_MIPS32) || defined(JS_CODEGEN_MIPS64)
> +        if (offsetFromEntry < PushedRetAddr) {
> +            // We haven't pushed the jit return address yet, thus the jit
> +            // frame is incomplete and we won't be able to unwind it; drop it.
> +            return false;

Why do we need to lose this frame?  It seems like we aren't using returnAddress anywhere here and, if we were, we could get it from state.lr.

::: js/src/wasm/WasmInstance.h
@@ +106,5 @@
> +    static constexpr size_t offsetOfJSJitExceptionHandler() {
> +        return offsetof(Instance, jsJitExceptionHandler_);
> +    }
> +    static constexpr size_t offsetOfTlsData() {
> +        return offsetof(Instance, tlsData_);

nit: offsetOfTlsData is dead now

::: js/src/wasm/WasmJS.cpp
@@ +1170,5 @@
>              return false;
>  
> +        fun.set(NewNativeFunction(cx, WasmCall, numArgs, name, gc::AllocKind::FUNCTION_EXTENDED,
> +                                  SingletonObject, JSFunction::WASM_FUN));
> +        optimized = true;

nit: could you move the 'optimized' above the fun.set() so that we're not doing anything before the null check.

::: js/src/wasm/WasmStubs.cpp
@@ +392,5 @@
> +}
> +
> +// Load instance's TLS from the callee.
> +static void
> +GenerateJitEntryLoadTls(MacroAssembler& masm, unsigned frameSize)

Since it's so short now, how about inlining it where it's useful to see the context (where we are relative to the prologue, etc).

@@ +452,5 @@
> +
> +    unsigned bytesNeeded = Max(normalBytesNeeded, oolBytesNeeded);
> +
> +    // Note the jit caller uses ensures the stack is aligned *after* the call
> +    // instruction.

s/uses //

@@ +457,5 @@
> +    unsigned frameSize = StackDecrementForCall(WasmStackAlignment, 0, bytesNeeded);
> +
> +    // Reserve stack space for wasm ABI arguments and TLS register, set up like
> +    // this:
> +    // <-- ABI args | padding | TlsReg

nit: I think this comment is now out of date (w.r.t TlsReg)

@@ +463,5 @@
> +
> +    GenerateJitEntryLoadTls(masm, frameSize);
> +
> +    if (fe.sig().hasI64ArgOrRet()) {
> +        masm.call(SymbolicAddress::ReportInt64JSCall);

Instead of having ReportInt64JSCall and this special case, could we have CanBeJitOptimized() return false when hasI64ArgOrRet?

::: js/src/wasm/WasmTypes.h
@@ +1053,5 @@
>      enum Kind {
>          Function,          // function definition
>          InterpEntry,       // calls into wasm from C++
> +        ImportJitExit,     // fast-path calling from wasm into jit code
> +        JitEntry,          // calls into wasm from jit code

nit: How about having the order be: InterpEntry, JitEntry, ImportJitExit, ImportInterpExit?

@@ +1153,5 @@
>      bool hasReturn() const {
> +        return isFunction() ||
> +               isImportExit() ||
> +               kind() == OldTrapExit ||
> +               isDebugTrap();

I think perhaps you can undo this hunk and the preceding hunk that changes the comment?
Attachment #8949103 - Flags: review?(luke)
Attachment #8949103 - Flags: review+
Attachment #8949103 - Flags: feedback?(jdemooij)
Blocks: 1407535
Comment on attachment 8949103 [details] [diff] [review]
4.stub.patch

Review of attachment 8949103 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM! Some optimization ideas below.

::: js/src/wasm/WasmStubs.cpp
@@ +558,5 @@
> +            // fallthrough:
> +
> +            masm.bind(&storeBack);
> +            masm.boxDouble(scratchF, scratchV, scratchF);
> +            masm.storeValue(scratchV, Address(sp, jitArgOffset));

Some perf suggestions:

* If this path is hot (maybe the int32 -> double case?) we could optimize it to do masm.storeDouble with our current boxing format. We could add masm.boxDouble(FloatRegister, const Address&) for that.

* For the cases where you're storing a known constant, you could do something like this, to bypass the FP registers entirely:

  masm.storeValue(DoubleValue(JS::GenericNaN()), Address(...));
  masm.jump(&next);

The same applies to the null/undefined -> int32 path.

* Instead of boxing in a ValueOperand, then storing the ValueOperand, you might be able to use masm.storeValue(JSValueType, payloadReg, Address) defined here:

https://searchfox.org/mozilla-central/rev/a539f8f7fe288d0b4d8d6a32510be20f52bb7a0f/js/src/jit/x64/MacroAssembler-x64.h#121
Attachment #8949103 - Flags: feedback?(jdemooij) → feedback+
Comment on attachment 8949104 [details] [diff] [review]
5.use-jit-entry.patch

Review of attachment 8949104 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice, some nits below. I'll go over this a second time in a few hours and r+ if I don't see anything serious.

::: js/src/jit/BaselineIC.cpp
@@ +1719,5 @@
>      // Attach a stub if the script can be Baseline-compiled. We do this also
>      // if the script is not yet compiled to avoid attaching a CallNative stub
>      // that handles everything, even after the callee becomes hot.
> +    if (((target->hasScript() && target->nonLazyScript()->canBaselineCompile()) ||
> +        (target->isNativeWithJitEntry())) &&

Nit: remove the parentheses around target->isNativeWithJitEntry()

::: js/src/jit/IonCacheIRCompiler.cpp
@@ +1086,5 @@
>  
> +    // The getter currently has a jit entry or a non-lazy script. We will only
> +    // relazify when we do a shrinking GC and when that happens we will also
> +    // purge IC stubs.
> +    MOZ_ASSERT(target->hasJitEntry());

Nit: the comment says "has a jit entry or a non-lazy script" but we assert just hasJitEntry(). So maybe we should say "wasm entry or a non-lazy script"?

@@ +2109,5 @@
>  
>      // Check stack alignment. Add sizeof(uintptr_t) for the return address.
>      MOZ_ASSERT(((masm.framePushed() + sizeof(uintptr_t)) % JitStackAlignment) == 0);
>  
> +    // The setter currently has a jit entry or a non-lazy script. We will only

Same here.

::: js/src/jsfun.h
@@ +625,5 @@
> +        MOZ_ASSERT(u.wasm.jitEntry_);
> +        return u.wasm.jitEntry_;
> +    }
> +
> +    // AsmJS non optimized ctor store the func index in the jitinfo slot, since

Nit: s/ctor/ctors/?
(Assignee)

Comment 58

a year ago
Thanks for the quick review and feedback! A lot of great catches and comments here, all applied except for:

(In reply to Luke Wagner [:luke] from comment #55)
> ::: js/src/wasm/WasmFrameIter.cpp
> @@ +899,5 @@
> > +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_MIPS32) || defined(JS_CODEGEN_MIPS64)
> > +        if (offsetFromEntry < PushedRetAddr) {
> > +            // We haven't pushed the jit return address yet, thus the jit
> > +            // frame is incomplete and we won't be able to unwind it; drop it.
> > +            return false;
> 
> Why do we need to lose this frame?  It seems like we aren't using
> returnAddress anywhere here and, if we were, we could get it from state.lr.

So the next thing that happens after this in profiling frame iteration is that the wasm::ProfilingFrameIter notices it's in a jit entry, then stores the unwound fp, then it is passed to the jit::ProfilingFrameIter, but that one can't unwind the frame because it's incomplete. So it's better to drop the frame at this point. I'll tweak the comment to describe this better.

> ::: js/src/wasm/WasmStubs.cpp
> @@ +392,5 @@
> > +}
> > +
> > +// Load instance's TLS from the callee.
> > +static void
> > +GenerateJitEntryLoadTls(MacroAssembler& masm, unsigned frameSize)
> 
> Since it's so short now, how about inlining it where it's useful to see the
> context (where we are relative to the prologue, etc).

Disagreed: it's a few lines of code that would get duplicated (it's called twice), and any ctags-like system will make it very easy to jump to the function and get back where you were in the first place.

> @@ +463,5 @@
> > +
> > +    GenerateJitEntryLoadTls(masm, frameSize);
> > +
> > +    if (fe.sig().hasI64ArgOrRet()) {
> > +        masm.call(SymbolicAddress::ReportInt64JSCall);
> 
> Instead of having ReportInt64JSCall and this special case, could we have
> CanBeJitOptimized() return false when hasI64ArgOrRet?

Yes; that means some churn in jsfun.h as well (which is in the next patch), so I'll just drop this change in the final folded patch.

> ::: js/src/wasm/WasmTypes.h
> @@ +1053,5 @@
> >      enum Kind {
> >          Function,          // function definition
> >          InterpEntry,       // calls into wasm from C++
> > +        ImportJitExit,     // fast-path calling from wasm into jit code
> > +        JitEntry,          // calls into wasm from jit code
> 
> nit: How about having the order be: InterpEntry, JitEntry, ImportJitExit,
> ImportInterpExit?

Even better: interp entry / jit entry / interp exit / jit exit, so "jit" always follows "interp".
(Assignee)

Comment 59

a year ago
Posted patch 6.optimize-stub-convert.patch (obsolete) — Splinter Review
Implemented your suggestions for the stub optimizations, asking for review since it's touching Assembler/MacroAssembler files for the new boxDouble variant. (it's a small loss for ARM, which is unfortunate, but oh well...)
Attachment #8949368 - Flags: review?(jdemooij)
(Reporter)

Comment 60

a year ago
(In reply to Benjamin Bouvier [:bbouvier] from comment #58)
> > Since it's so short now, how about inlining it where it's useful to see the
> > context (where we are relative to the prologue, etc).
> 
> Disagreed: it's a few lines of code that would get duplicated (it's called
> twice), 

Ah, I didn't see it was called twice.

> > > +        JitEntry,          // calls into wasm from jit code
> > 
> > nit: How about having the order be: InterpEntry, JitEntry, ImportJitExit,
> > ImportInterpExit?
> 
> Even better: interp entry / jit entry / interp exit / jit exit, so "jit"
> always follows "interp".

Oh hah, yes, that's what I meant to type :)
(Assignee)

Comment 61

a year ago
Posted patch 7.remove-i64-fast-path.patch (obsolete) — Splinter Review
This implements Luke's suggestion to remove the fast path if int64 is in the signature; not much really wasm specific, and it touches more jsfun changes you've reviewed earlier.

It's a bit wacky because a JSFunction now has to distinguish between:
- asm.js vs wasm
- optimized vs not optimized

and all four combinations are possible. To avoid introducing a new JSFunction::flag (we're running out of space!), I decided that testing for isWasmOrAsmJS() would consist in testing that the native is WasmCall() (true for both asm.js and wasm).

When we get rid of asm.js optimizations, this will become much simpler (we could replace the AsmJS kind by a Wasm kind at this time).
Attachment #8949398 - Flags: review?(jdemooij)
Comment on attachment 8949104 [details] [diff] [review]
5.use-jit-entry.patch

Review of attachment 8949104 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: js/src/jsscript.h
@@ +899,5 @@
> +    // Pointer to baseline->method()->raw(), ion->method()->raw(), a wasm jit
> +    // entry, the JIT's EnterInterpreter stub, or the lazy link stub. Must be
> +    // non-null.
> +    uint8_t* jitCodeRaw_;
> +    uint8_t* jitCodeSkipArgCheck_;

Do we have static_asserts comparing the offset of these fields to the wasm entry offset?
Attachment #8949104 - Flags: review?(jdemooij) → review+
Comment on attachment 8949368 [details] [diff] [review]
6.optimize-stub-convert.patch

Review of attachment 8949368 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, but I think we can simplify boxDouble a lot by defining it in MacroAssembler-inl.h and forwarding to storeDouble? We only want a separate method name to document we're boxing a Value (in case the boxing format changes).
Attachment #8949368 - Flags: review?(jdemooij)
(Assignee)

Comment 64

a year ago
(In reply to Jan de Mooij [:jandem] from comment #62)
> ::: js/src/jsscript.h
> @@ +899,5 @@
> > +    // Pointer to baseline->method()->raw(), ion->method()->raw(), a wasm jit
> > +    // entry, the JIT's EnterInterpreter stub, or the lazy link stub. Must be
> > +    // non-null.
> > +    uint8_t* jitCodeRaw_;
> > +    uint8_t* jitCodeSkipArgCheck_;
> 
> Do we have static_asserts comparing the offset of these fields to the wasm
> entry offset?

yes! Thanks for the review.
Comment on attachment 8949398 [details] [diff] [review]
7.remove-i64-fast-path.patch

Review of attachment 8949398 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable and I like the code removal, but I'm a bit confused by the different asm.js/wasm optimized/unoptimized cases, see the comment below.

::: js/src/jsfun.h
@@ +190,5 @@
>      bool isNative()                 const { return !isInterpreted(); }
>  
>      bool isConstructor()            const { return flags() & CONSTRUCTOR; }
>  
>      /* Possible attributes of a native function: */

This is becoming complicated and I don't fully understand all cases. It would be great to have a comment here explaining the different types of natives, something like:

* Builtin: ...has field X, flag Y...
* Wasm optimized: has field A, used for B.
* Wasm unoptimized: has field C, used for D.
* Asm.js optimized: ...
* Asm.js unoptimized: ...

It would also be good to explain how isAsmJSNative and isBuiltinNative map to these different kinds.
Attachment #8949398 - Flags: review?(jdemooij)
(Assignee)

Comment 66

a year ago
(duh for me, good thing I asked for a review!)
Attachment #8949368 - Attachment is obsolete: true
Attachment #8949671 - Flags: review?(jdemooij)
(Assignee)

Comment 67

a year ago
As discussed with Luke:

- we keep the optimized path even for i64 functions
- asm.js don't get optimized, to give an incentive to move to wasm and to keep the code simple and clean.

The latter really simplifies jsfun.
Attachment #8949398 - Attachment is obsolete: true
Attachment #8949672 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8946757 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8948746 - Attachment is obsolete: true
Attachment #8949671 - Flags: review?(jdemooij) → review+
Comment on attachment 8949672 [details] [diff] [review]
7.disable-asmjs-opt.patch

Review of attachment 8949672 [details] [diff] [review]:
-----------------------------------------------------------------

Beautiful. A lot simpler!
Attachment #8949672 - Flags: review?(jdemooij) → review+

Comment 69

a year ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c38fed7ef9cb
Add tests; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb6130865cac
Implement the jit-to-wasm entry stub and use it; r=luke, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd805f7ee600
Slightly optimize the jit->wasm stub; r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d4b30834cad
Remove the optimization for asm.js; r=jandem
(Assignee)

Comment 71

a year ago
Some post-landing benchmarks:

Regular benchmarking caveat:
- for a given VM, benchmarks can't be compared to one another: because they're quite different in performance, it was needed to have different iteration counts per benchmark for each one of them.
- for a given benchmark, it's a bit unfair to make general conclusions about other VMs, because we focus here on cases that Spidermonkey specifically optimizes. We're not sure other VMs do optimize those same cases as well.

In below content:
- callKnown refers to the monomorphic case (i.e. we know what is the callee).
- callGeneric refers to the polymorphic case (i.e. we don't what is the callee).
- scripted getter/setter are pretty explicit.
- "function.apply array" is the case where one would do this: function f() {} f.apply(this, [1, 2]);
- "function.apply arguments" is the same case but using "arguments" instead of an array
- "function.call" is the function f(){} f.call(this, 1, 2) case.
- "rectifying" means the callee expects more arguments than what is provided at call site (e.g. function f(a,b) {} called with f(1)).

The wasm function is just doing a simple addition of two int32 numbers; on the 0 argument case, these numbers are two constants; on the 1 arg case, one number is a constant; on the 2 args case, the two parameters are added together.

===

First benchmark: 100M calls.

spidermonkey-before:

call known 0args time:  4468.31 (ms)
call known 1arg time:  4496.87
call known 2args time:  5477.03
call known 2args rectifying1 time:  6014.86
call generic 2args time:  5968.78
call generic 2args rectifying1 time:  6149.09
scripted getter time:  4505.62
scripted getter rectifiying time:  6640.11
scripted setter time:  4484.73
scripted setter rectifiying time:  5792.48
function.apply array time:  9735.14
function.apply array rectifying time:  9635.35
function.apply args time:  5915.26
function.apply args rectifying time:  6333
function.call time:  6319.61
function.call rectifying time:  6308.4

spidermonkey-after

call known 0args time:  373.47
call known 1arg time:  620.41
call known 2args time:  718.12
call known 2args rectifying1 time:  746.04
call generic 2args time:  932.34
call generic 2args rectifying1 time:  955.65
scripted getter time:  315.44
scripted getter rectifiying time:  637.64
scripted setter time:  598.94
scripted setter rectifiying time:  693.87
function.apply array time:  1407.39
function.apply array rectifying time:  1631.87
function.apply args time:  721.47
function.apply args rectifying time:  741.82
function.call time:  744.11
function.call rectifying time:  743.85

v8:

call known 0args time:  317.14
call known 1arg time:  458
call known 2args time:  599.48
call known 2args rectifying1 time:  1284.08
call generic 2args time:  1246.78
call generic 2args rectifying1 time:  2422.65
scripted getter time:  293.01
scripted getter rectifiying time:  2398.15
scripted setter time:  389.89
scripted setter rectifiying time:  9122.49
function.apply array time:  1604.21
function.apply array rectifying time:  2171.43
function.apply args time:  608
function.apply args rectifying time:  1305.95
function.call time:  1269.02
function.call rectifying time:  1273.65

===

Second benchmark, to better feel the differences between spidermonkey and v8: iteration count ported to 1G calls.

spidermonkey-after:

call known 0args time:  3611.26
call known 1arg time:  6242.61
call known 2args time:  7223.34
call known 2args rectifying1 time:  7429.34
call generic 2args time:  9409.28
call generic 2args rectifying1 time:  9842.55
scripted getter time:  3147.34
scripted getter rectifiying time:  6505.59
scripted setter time:  5997.26
scripted setter rectifiying time:  6799.42
function.apply array time:  13651.08
function.apply array rectifying time:  15907.44
function.apply args time:  7224.51
function.apply args rectifying time:  7445.38
function.call time:  7419.84
function.call rectifying time:  7452.37

v8-after:

call known 0args time:  3425.9
call known 1arg time:  4219.59
call known 2args time:  6005.81
call known 2args rectifying1 time:  12802.99
call generic 2args time:  12811.01
call generic 2args rectifying1 time:  24212.94
scripted getter time:  2786.99
scripted getter rectifiying time:  24311.05
scripted setter time:  4026.39
scripted setter rectifiying time:  91326.8
function.apply array time:  16251.29
function.apply array rectifying time:  23203.92
function.apply args time:  5677.56
function.apply args rectifying time:  12745.49
function.call time:  12733.76
function.call rectifying time:  13088.8

v8 is really good at the monomorphic case, it'd be interesting to see if they are doing something in particular there, and if we can do better. Opened bug 1437065 to investigate this.

===

Third benchmark, to compare the difference between non-inlined calls in JS vs wasm, only in spidermonkey:

spidermonkey-before:
call known JS time:  3197.78
call known 2args time:  56114.73

spidermonkey-after:
call known JS time:  3138.43
call known 2args time:  7216.65

So Spidermonkey JS->wasm calls just became only 2.3 times slower than plain JS->JS calls (vs ~20 times slower before).
(Assignee)

Updated

a year ago
Keywords: leave-open
(Reporter)

Comment 73

a year ago
\o/ \o/ \o/
status-firefox53: affected → ---
Depends on: 1437501
Depends on: 1437499
Depends on: 1455612
You need to log in before you can comment on or make changes to this bug.