1232205 - wasm: Baseline JIT

Lars T Hansen [:lth]

Assignee

•

9 years ago

Attached file bug1232205-wasm-baseline.bundle (obsolete) — Details

A bundle containing my patch queue, including the 2016-04-25 postorder patch at the root. x64 only, and still somewhat messy. See comments at the head of asmjs/WasmBaselineCompile.cpp for more status information. This passes all our in-tree wasm tests (with an adjustment to hasI64, included here) and all our in-tree asm.js tests except (a) those that use non-signaling bounds or interrupt checks, and (b) one modulo test, commented out here.

Lars T Hansen [:lth]

•

9 years ago

Depends on: 1272640

Dan Gohman [:sunfish]

Comment 6

•

9 years ago

Attached patch combined.patch (obsolete) — Details — Splinter Review

For convenience for people who want to take a peek, here's a combined version of the current bundle as a single patchfile.

Lars T Hansen [:lth]

Assignee

Comment 7

•

9 years ago

Attached file bug1232205-wasm-baseline.bundle (obsolete) — Details

Cleaned up and rebased to current m-i. A few tests remain commented out (see separate patches in this set); it passes all other wasm and asm.js tests and runs AngryBots.

Attachment #8754941 - Attachment is obsolete: true

Attachment #8755117 - Attachment is obsolete: true

Lars T Hansen [:lth]

Assignee

Comment 8

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbf1c7d3c252

Lars T Hansen [:lth]

Assignee

Comment 9

•

9 years ago

Attached file bug1232205-wasm-baseline.bundle (obsolete) — Details

Additionally compiles on win32, win64, linux32, linux32-arm-sim, linux64, linux64-arm64-sim, mac64.

Attachment #8755356 - Attachment is obsolete: true

Lars T Hansen [:lth]

Assignee

Comment 10

•

9 years ago

Attached patch bug1232205-infrastructure.patch (obsolete) — Details — Splinter Review

Part 1, infrastructure. This hooks the baseline compiler into the rest of the engine and creates the necessary infrastructure to trigger baseline compilation when appropriate, but the baseline compiler is just a stub (and the mechanism that determines whether to trigger it always returns false).

Attachment #8756853 - Flags: review?(bbouvier)

Douglas Crosher [:dougc]

Comment 11

•

9 years ago

Might it be practical to note ranges of local variables that hold a constant and defer emitting the constant until necessary? The main motivation would be an alternative to determining the live ranges where a `set_local` operation dominates uses and thus avoid flushing a register to memory. In some cases the value might be unused and then the constant need never be materialized. This would support a `get_and_zero_local` operator that could be used to help reduce flushing of locals cached in registers. Oh, and could the baseline compiler cache local variables in registers too, not just the values stack? With this optimization I could explore the expressionless encoding again and since it is not constrained by the stack ordering it might be able to optimize operator ordering to reduce register pressure and then this may well be the encoding that gives the best performance from a baseline compiler. Something to explore. Even being able to defer emitting a constant in a local variable would help work around the stack ordering constraints and could help reduce register pressure, so it seems useful even without the above use case. For decoding to SSA, if wasm code can use `get_and_zero_local` on the last use of a local then at control flow merge points the locals not live will be flagged as constant zero rather than having potentially different definitions, and this would avoid the need to emit a phi for them rather a single definition could be carried forward flagged as zero (do you think that would work?). What do you think?

Lars T Hansen [:lth]

Assignee

Comment 12

•

9 years ago

(In reply to Douglas Crosher [:dougc] from comment #11) > Might it be practical to note ranges of local variables that hold a constant > and defer emitting the constant until necessary? While it is certainly possible for the wasm code to contain assignments of constants to locals, I would expect the generator of the wasm code to perform constant propagation and get rid of the bulk of such code. (Initializations excepted.) I expect that for your expressionless encoding this will be a much more acute problem, but as it is not a problem for the existing wasm format I'm probably disinclined to try to do anything about it. > Oh, and could the > baseline compiler cache local variables in registers too, not just the > values stack? Yes, I do want to do something in that direction, but I've not decided exactly how to do it. The two obvious choices are to assign some fixed locals to registers (eg incoming parameters that are in registers + the first few locals, depending on how many registers we want to use for this) or to cache local variables in straight-line code. A combination of the two is appealing. It may be a little while before I start looking into this optimization though, there are many other things that are higher priority. > With this optimization I could explore the expressionless encoding again and > since it is not constrained by the stack ordering it might be able to > optimize operator ordering to reduce register pressure and then this may > well be the encoding that gives the best performance from a baseline > compiler. Something to explore. Be my guest. The baseline compiler's goal is to generate acceptable code at high speed - we don't want "good" code if that will significantly impact the compiler's speed.

Lars T Hansen [:lth]

Assignee

Comment 13

•

9 years ago

(In reply to Douglas Crosher [:dougc] from comment #11) > Might it be practical to note ranges of local variables that hold a constant > and defer emitting the constant until necessary? ... > Oh, and could the baseline compiler cache local variables in registers too, not just the values stack? These might be two sides of the same coin, as they are about having alternative "locations" for locals. The baseline compiler could probably do something clever in straight-line code and notably across calls, but it would probably want to spill at control flow boundaries (if / block / loop).

Douglas Crosher [:dougc]

Comment 14

•

9 years ago

(In reply to Lars T Hansen [:lth] from comment #12) > (In reply to Douglas Crosher [:dougc] from comment #11) > > > Might it be practical to note ranges of local variables that hold a constant > > and defer emitting the constant until necessary? > > While it is certainly possible for the wasm code to contain assignments of > constants to locals, I would expect the generator of the wasm code to > perform constant propagation and get rid of the bulk of such code. > (Initializations excepted.) Good point, but fwiw here are some numbers: zlib: 10% of set_locals have a constant value. AngryBots: also 10% of set_locals have a constant value. Guess this is relatively small, but these numbers help a little to make a case for this optimization, just as constants on the values stack are optimized. > I expect that for your expressionless encoding this will be a much more > acute problem, but as it is not a problem for the existing wasm format I'm > probably disinclined to try to do anything about it. Yes, it's an 'acute problem' for the expressionless style. Keep in might that some producers might want to emit code in this style to avoid the burden of shoe horning their output into the stack order. Another optimization that would help for the expressionless style is to note the result local variables for operations in general as these locals would then be known to not be live after reading the arguments and might help avoid flushing some values to memory. This might require looking ahead with the current encoding, so might be a bit of a burden. > > Oh, and could the > > baseline compiler cache local variables in registers too, not just the > > values stack? > > Yes, I do want to do something in that direction, but I've not decided > exactly how to do it. The two obvious choices are to assign some fixed > locals to registers (eg incoming parameters that are in registers + the > first few locals, depending on how many registers we want to use for this) > or to cache local variables in straight-line code. A combination of the two > is appealing. It may be a little while before I start looking into this > optimization though, there are many other things that are higher priority. That sounds great. I think binaryen can already sort locals to bias high use frequency locals into low indexes which might fit with this strategy. But a LRU cache might be simple enough and the registers are a resource that might be best shared between local variables and the expression stack values so might need a common cache. > > With this optimization I could explore the expressionless encoding again and > > since it is not constrained by the stack ordering it might be able to > > optimize operator ordering to reduce register pressure and then this may > > well be the encoding that gives the best performance from a baseline > > compiler. Something to explore. > > Be my guest. The baseline compiler's goal is to generate acceptable code at > high speed - we don't want "good" code if that will significantly impact the > compiler's speed. Better code at high compilation speed is what I have in mind. So the producer would have the burden of replacing get_local by get_and_zero_local and all the baseline compiler needs to do is defer emitting these zero constants which does not appear to be a large performance burden and something already done for the expression stack values. It will be a smaller step for me after the baseline compiler can cache locals in registers, so if that is on the horizon then I will keep following. But I can start exploring the wasm input code generation and studying what potential there is to reduce the live set by reordering operators etc. Do you see any merit in the approach though? Do you think it could minimize unnecessary writes of values cached in registers? Can you see any obvious and significant cases it would not handle? Do you think there is any potential for a high speed compiler to emit better code if the producer optimizes the operation order to minimize the live set which could reduce register pressure?

Lars T Hansen [:lth]

Assignee

Comment 15

•

9 years ago

(In reply to Douglas Crosher [:dougc] from comment #14) > zlib: 10% of set_locals have a constant value. > AngryBots: also 10% of set_locals have a constant value. That's interesting, but it's hard to know what that means without knowing what happens next. If the locals are read in the same basic block then we could optimize but I'd expect the C++ compiler to have constant-propagated already. > Another optimization that would help for the expressionless style is to note > the result local variables for operations in general as these locals would > then be known to not be live after reading the arguments and might help > avoid flushing some values to memory. This might require looking ahead with > the current encoding, so might be a bit of a burden. I see the appeal of that, but I don't quite see it happening :) > But I can start exploring the wasm input code generation and studying what > potential there is to reduce the live set by reordering operators etc. > > Do you see any merit in the approach though? > > Do you think it could minimize unnecessary writes of values cached in > registers? > > Can you see any obvious and significant cases it would not handle? For all those questions: I don't have the data. It's somewhat plausible. It may or may not matter. > Do you think there is any potential for a high speed compiler to emit better > code if the producer optimizes the operation order to minimize the live set > which could reduce register pressure? Yes, I think that is probably the case. But I don't know if the producer should do that, because surely the producer wants to produce code that makes the 2nd-tier jit produce better code? After all the point of the baseline compiler is to get the application up and running (throughput), but the expectation is that all truly performance-sensitive code will hit the 2nd-tier jit pretty quickly.

Douglas Crosher [:dougc]

Comment 16

•

9 years ago

(In reply to Lars T Hansen [:lth] from comment #15) > (In reply to Douglas Crosher [:dougc] from comment #14) ... > > Do you think there is any potential for a high speed compiler to emit better > > code if the producer optimizes the operation order to minimize the live set > > which could reduce register pressure? > > Yes, I think that is probably the case. But I don't know if the producer > should do that, because surely the producer wants to produce code that makes > the 2nd-tier jit produce better code? After all the point of the baseline > compiler is to get the application up and running (throughput), but the > expectation is that all truly performance-sensitive code will hit the > 2nd-tier jit pretty quickly. fwiw Nothing comes to mind that would disadvantage a higher performance compiler by having the operators ordered to reduce the live set. Might argue that the produce should not have to bother with this and should be free to arrange the order to optimize encoded size, or even that the encoding should be optimize for interpretation. Anyway it might be interesting to get some data, and I'll keep exploring it.

Benjamin Bouvier [:bbouvier] (inactive)

Comment 17

•

9 years ago

Comment on attachment 8756853 [details] [diff] [review] bug1232205-infrastructure.patch Review of attachment 8756853 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! I assume this would land with the entire baseline compiler set of patches. Otherwise, it'd be nice to know how the booleans atomics/simdObserved are getting used, or just defer them to another patch. ::: js/src/asmjs/AsmJS.cpp @@ +1682,5 @@ > ImportMap importMap_; > ArrayViewVector arrayViews_; > bool atomicsPresent_; > + bool atomicsObserved_; // Within function body > + bool simdObserved_; // Within function body Might be useful to indicate this can be set before any function is seen (e.g. globals section) but will be unset before we run into the first function. @@ +2529,5 @@ > ExtractSimdValue(ModuleValidator& m, ParseNode* pn) > { > MOZ_ASSERT(IsSimdLiteral(m, pn)); > > + m.simdObserved(); nit: setSimdObserved() ::: js/src/asmjs/WasmBaselineCompile.h @@ +1,4 @@ > +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- > + * vim: set ts=8 sts=4 et sw=4 tw=99: > + * > + * Copyright 2015 Mozilla Foundation (uber nit: 2016) @@ +20,5 @@ > +#define asmjs_wasm_baseline_compile_h > + > +#include "asmjs/WasmBinary.h" > +#include "asmjs/WasmIonCompile.h" > +#include "jit/MacroAssembler.h" jit/MacroAssembler.h looks like it's not needed right now. ::: js/src/asmjs/WasmGenerator.h @@ +260,5 @@ > + void setUsesAtomics() { > + usesAtomics_ = true; > + } > + > + bool usesSignalsForInterrupts() const { I guess the purpose of usesSimd(), usesAtomics(), usesSignalsForInterrupts() is to prevent baseline compilation of functions, right? It is not used right now, but I am pretty sure it is going to get used in the patch series, so fine to keep in this patch. Do we abort baseline compilation whenever any of these three booleans is set to true? If so, we could just have one boolean canBaselineCompile. Or something else? I would love to see how these are used in BaselineCanCompile already. ::: js/src/asmjs/WasmIonCompile.cpp @@ +3475,5 @@ > + case wasm::IonCompileTask::CompileMode::Ion: > + return wasm::IonCompileFunction(task); > + case wasm::IonCompileTask::CompileMode::Baseline: > + return wasm::BaselineCompileFunction(task); > + default: There are not too many modes: can we just explicitly mention None here, so that any new mode (!) would trigger a compilation warning/error, please?

Attachment #8756853 - Flags: review?(bbouvier) → review+

Lars T Hansen [:lth]

Assignee

Comment 18

•

9 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #17) > Comment on attachment 8756853 [details] [diff] [review] > bug1232205-infrastructure.patch > > Thanks! I assume this would land with the entire baseline compiler set of > patches. Otherwise, it'd be nice to know how the booleans > atomics/simdObserved are getting used, or just defer them to another patch. This will not land by itself, not to worry. > Do we abort baseline compilation whenever any of these three booleans is set > to true? If so, we could just have one boolean canBaselineCompile. Or > something else? I would love to see how these are used in BaselineCanCompile > already. The point here is to defer the decision to the baseline compiler, which knows what it can do on the particular platform; these flags may be used by the baseline compiler, or they may not - it is platform dependent. (Right now they are used on x64 but not elsewhere because the baseline compiler turns itself off on other platforms, for example.) This is a moderately clean separation of concerns. The followup patch should arrive today or tomorrow, I'm cleaning it up now.

Lars T Hansen [:lth]

Assignee

Comment 19

•

9 years ago

Attached patch bug1232205-allsinglemask.patch — Details — Splinter Review

Patch 2/8: Define AllSingleMask on all platforms. The baseline compiler uses this to manage float32 registers. It was previously defined only on MIPS (!) but I've been unable to find a reason why it could not be defined generally.

Attachment #8755851 - Attachment is obsolete: true

Lars T Hansen [:lth]

•

9 years ago

Depends on: 1277008

Lars T Hansen [:lth]

Assignee

Attachment #8757957 - Flags: review?(bbouvier)

Attachment #8758613 - Flags: review?(bbouvier)

Attachment #8758708 - Flags: review?(luke)

Attachment #8758708 - Flags: review?(bbouvier)

Luke Wagner [:luke]

•

9 years ago

(In reply to Luke Wagner [:luke] from comment #35) > Comment on attachment 8756853 [details] [diff] [review] > bug1232205-infrastructure.patch > > Review of attachment 8756853 [details] [diff] [review]: > ----------------------------------------------------------------- > > Drive-by nits: > > ::: js/src/asmjs/AsmJS.cpp > @@ +6991,5 @@ > > > > + if (m.simdObserved()) > > + f.setUsesSimd(); > > + if (m.atomicsObserved()) > > + f.setUsesAtomics(); > > The FunctionValidator is available everywhere that simd/atomics; Several places I only have a ModuleValidator, eg, in IsCoercionCall() and IsSimdLiteral(), is there a trick to get to a FunctionValidator from the ModuleValidator or from another value available there? Those are both static functions. > can you > remove the state/methods from ModuleValidator and call the FunctionValidator > methods directly? This has the advantage of not requiring any explicit > clearing. If there's a way to get the FunctionValidator, absolutely happy to do this.

Luke Wagner [:luke]

Comment 37

•

9 years ago

(In reply to Lars T Hansen [:lth] from comment #36) > Several places I only have a ModuleValidator, eg, in IsCoercionCall() and > IsSimdLiteral(), is there a trick to get to a FunctionValidator from the > ModuleValidator or from another value available there? Those are both > static functions. Ah, I see, b/c those functions are also called in a module-level context for globals. Really, the root issue here is detecting SIMD local types. Perhaps instead you could have the FunctionValidator, once it finishes the function body (in FV::finish(), effectively the same place you're currently propagating those flags from MV to FV), just scan its list of local types and flag SIMD?

Lars T Hansen [:lth]

Assignee

Comment 38

•

9 years ago

(In reply to Luke Wagner [:luke] from comment #37) > (In reply to Lars T Hansen [:lth] from comment #36) > > Several places I only have a ModuleValidator, eg, in IsCoercionCall() and > > IsSimdLiteral(), is there a trick to get to a FunctionValidator from the > > ModuleValidator or from another value available there? Those are both > > static functions. > > Ah, I see, b/c those functions are also called in a module-level context for > globals. Really, the root issue here is detecting SIMD local types. > Perhaps instead you could have the FunctionValidator, once it finishes the > function body (in FV::finish(), effectively the same place you're currently > propagating those flags from MV to FV), just scan its list of local types > and flag SIMD? Just to clarify, examining the types by themselves is not enough for all cases but along with examining operations this seems to work fine. Thanks! Expect a review coming up, just to be on the safe side...

Lars T Hansen [:lth]

Assignee

Comment 39

•

9 years ago

Attached patch bug1232205-infrastructure-v2.patch — Details — Splinter Review

This addresses all nits from Benjamin and Luke. Carrying forward Benjamin's r+. Luke, can you look over the cleaned-up attribute computation in AsmJS.cpp? Thanks.

Attachment #8756853 - Attachment is obsolete: true

Attachment #8759163 - Flags: review?(luke)

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Attachment #8759163 - Flags: review+

Lars T Hansen [:lth]

Assignee

Comment 40

•

9 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #27) > Comment on attachment 8757946 [details] [diff] [review] > bug1232205-allsinglemask.patch > > > @@ +296,1 @@ > > static const SetType AllMask = ((1ull << 48) - 1); > > ... could we take advantage of this patch to make this code easier to > understand for newcomers? > > Double check me, but I think we'd have: > > AllDoubleMask = ((1ull << TotalDouble) - 1) << TotalSingle; > AllSingleMask = (1ull << TotalSingle) - 1; > AllMask = (1ull << invalid_freg); // not sure this one makes it clearer That's all correct, though the last one doesn't smell too good. AllDoubleMask | AllSingleMask is better, and is what MIPS uses. I'll tidy it up, there's no cost to doing so.

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Blocks: 1277008

No longer depends on: 1277008

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Blocks: 1277011

No longer depends on: 1277011

Luke Wagner [:luke]

Comment 41

•

9 years ago

Comment on attachment 8759163 [details] [diff] [review] bug1232205-infrastructure-v2.patch Review of attachment 8759163 [details] [diff] [review]: ----------------------------------------------------------------- Excellent, thank you! ::: js/src/asmjs/AsmJS.cpp @@ +2867,5 @@ > MOZ_ASSERT(continuableStack_.empty()); > MOZ_ASSERT(breakLabels_.empty()); > MOZ_ASSERT(continueLabels_.empty()); > + > + for ( auto iter = locals_.all(); !iter.empty(); iter.popFront() ) { nit: no spaces before "auto" or after "popFront()". ::: js/src/asmjs/WasmGenerator.h @@ +261,5 @@ > + usesAtomics_ = true; > + } > + > + bool usesSignalsForInterrupts() const { > + return m_ && m_->args().useSignalHandlersForInterrupt; I think the "m_ &&" is unnecessary (m_ is initialized immediately by MG::startFuncDef()). Also it'd be creepy if the value of this seemingly-immutable compilation parameter had a varying value. ::: js/src/asmjs/WasmIonCompile.cpp @@ +3476,5 @@ > + return wasm::IonCompileFunction(task); > + case wasm::IonCompileTask::CompileMode::Baseline: > + return wasm::BaselineCompileFunction(task); > + default: > + MOZ_CRASH("Uninitialized task"); Can you add the third case; 'default' silences the compiler's "you forgot a case" warning.

Attachment #8759163 - Flags: review?(luke) → review+

Lars T Hansen [:lth]

Assignee

Comment 42

•

9 years ago

(In reply to Luke Wagner [:luke] from comment #41) > Comment on attachment 8759163 [details] [diff] [review] > bug1232205-infrastructure-v2.patch > > > ::: js/src/asmjs/WasmIonCompile.cpp > @@ +3476,5 @@ > > + return wasm::IonCompileFunction(task); > > + case wasm::IonCompileTask::CompileMode::Baseline: > > + return wasm::BaselineCompileFunction(task); > > + default: > > + MOZ_CRASH("Uninitialized task"); > > Can you add the third case; 'default' silences the compiler's "you forgot a > case" warning. Ho hum. As discussed on IRC with Benjamin, gcc 5.2.1 on my Linux box will then complain that there are switch arms that do not return a value, and I'm not sure if that's better. (I assume this is a gcc or glibc bug, since MOZ_CRASH calls abort and abort "should" be marked as noreturn.) But sure, I can hack something up.

Luke Wagner [:luke]

•

9 years ago

Attached patch bug1232205-ool-truncate-v2.patch — Details — Splinter Review

Patch 5/8, v2: out-of-line truncate code. I'm going to ask for a re-review here because I cleaned up the code in MacroAssembler.cpp (outOfLineTruncateSlow) as discussed in my previous comment.

Attachment #8757949 - Attachment is obsolete: true

Attachment #8759210 - Flags: review?(bbouvier)

Lars T Hansen [:lth]

Assignee

Comment 47

•

9 years ago

(In reply to Luke Wagner [:luke] from comment #43) > Comment on attachment 8758708 [details] [diff] [review] > bug1232205-wasm-baseline-compiler-v2.patch > > Review of attachment 8758708 [details] [diff] [review]: > ----------------------------------------------------------------- > > Ultimately, I don't think it's realistic for me or bbouvier to fully review > every single function here, so I think the best plan is to rubberstamp and > land after a few passes and iterate on trunk. That way we also benefit from > early fuzzing. Speaking of, we should reach out to the fuzzing team after > landing and to tell them about the new flag. Good plan. > ::: js/src/asmjs/WasmBaselineCompile.cpp > @@ +23,5 @@ > > + * additional tag to indicate the area of improvement. > > + * > > + * Mostly the following naming abbreviation is used: "I"=Int32, > > + * "X"=Int64, "F"=Float32, "D"=Float64. If an unsigned interpretation > > + * is to be used, a "U" is appended. > > We generally use I32/I64/F32/F64 for this distinction (matching the wasm > spec in naming convention). Would it be ok to use that here too? I know, and I chose the abbreviation with some trepidation for that reason, but decided to go with it because it is less noisy. Initially my reluctance to the standard notation came from writing things like pop2I32(...) and going slightly insane. Not saying "no", not saying "yes" either, not yet. I may experiment to see how bad I think it is. > @@ +105,5 @@ > > +using mozilla::SpecificNaN; > > + > > +namespace js { > > +namespace wasm { > > +namespace baseline { > > I'm generally inclined not to have a sub-namespace inside wasm. I can live with that. > So for FunctionCompiler, perhaps we could rename > to BaseCompiler? And that. (And so on for other naming suggestions you make later.) > @@ +124,5 @@ > > + // The baseline compiler tracks control items on a stack of its > > + // own as well. > > + // > > + // TODO / REDUNDANT: It would be nice if we could make use of the > > + // iterator's ControlItems and not require our own stack for that. > > It'd be nice talk with Dan about this to see if there's something we could > do to generalize ExprIter b/c I agree this would be great to do at some > point in the future (after landing :). Yes, there's this functionality, and then also a sniff-next-opcode function that would be nice to have for work you've not seen yet and that I currently have to simulate (which is bearable). > @@ +4293,5 @@ > > + return pushControl(&blockEnd); > > +} > > + > > +void > > +FunctionCompiler::endBlock() > > Just as a style question: do you think it makes sense to have *all* > non-helper codegen methods out-of-line (like you've done here and below for > some of the bigger methods), grouped by category with these nice little > categorical comment blocks like you have above for "blocks and loops"? Well... We could do that. The current situation is a little messy since there's really no clear dividing line between what's inline and out-of-line. I'd want to measure performance along the way because I'm a little concerned that the C++ compiler will not as readily in-line an out-of-line method, but I could just be paranoid, and again, the current situation is messy and not obviously optimal in any way. > > @@ +6177,5 @@ > > +#ifdef JS_CODEGEN_ARM64 > > + // FIXME: There is a hack up at the top to allow the baseline > > + // compiler to compile on ARM64 (by defining StackPointer), but > > + // the resulting code cannot run. So prevent it from running. > > + MOZ_CRASH("Several adjustments required for ARM64 operation"); > > Heh, I think we'll crash 20 ways if we try to run ARM64 in asm.js/wasm, so I > wouldn't spend any code/comments on ARM64 other than keeping it building. Belt and suspenders... :)

Lars T Hansen [:lth]

Assignee

Comment 48

•

9 years ago

Attached patch bug1232205-long-to-float-and-back-v2.patch — Details — Splinter Review

Patch 3/8 v2: Refactored as requested earlier. Looking good, IMO - clearly an improvement.

Attachment #8757947 - Attachment is obsolete: true

Attachment #8759252 - Flags: review?(bbouvier)

Lars T Hansen [:lth]

Assignee

Comment 49

•

9 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #34) > Comment on attachment 8757952 [details] [diff] [review] > bug1232205-test-directives.patch > > Review of attachment 8757952 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks. Out of curiosity, would putting |test-also-wasm-baseline| in > lib/wasm.js work too? I don't see any reason why it would, but the testing > harness has surprised me a few times already with nice features :) There's no evidence of that going on. Given that wasm.js is loaded dynamically based on not-statically-computable information there would at least have to be an instrumented load command that feeds information from files it loads back into the (currently running) test harness, which seems like a tall order. Work on an improved test runner has been spun out as bug 1277770.

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

URL: https://github.com/lars-t-hansen/moz-...

Lars T Hansen [:lth]

Assignee

Comment 50

•

9 years ago

(In reply to Lars T Hansen [:lth] from comment #47) > (In reply to Luke Wagner [:luke] from comment #43) > > Comment on attachment 8758708 [details] [diff] [review] > > bug1232205-wasm-baseline-compiler-v2.patch > > > > @@ +4293,5 @@ > > > + return pushControl(&blockEnd); > > > +} > > > + > > > +void > > > +FunctionCompiler::endBlock() > > > > Just as a style question: do you think it makes sense to have *all* > > non-helper codegen methods out-of-line (like you've done here and below for > > some of the bigger methods), grouped by category with these nice little > > categorical comment blocks like you have above for "blocks and loops"? > > Well... We could do that. The current situation is a little messy since > there's really no clear dividing line between what's inline and out-of-line. > I'd want to measure performance along the way because I'm a little concerned > that the C++ compiler will not as readily in-line an out-of-line method, but > I could just be paranoid, and again, the current situation is messy and not > obviously optimal in any way. I did measure performance. There's no discernible difference (sample of one, but it's what we have). I'll go with your suggestion.

Lars T Hansen [:lth]

Assignee

Comment 51

•

9 years ago

Attached patch bug1232205-wasm-baseline-compiler-v3.patch (obsolete) — Details — Splinter Review

This version addresses most of Luke's concerns on the previous patch, and cleans up comments and a few other things. Outstanding items: - I've started an email thread about reusing ExprIter's control stack. - I've not concluded about renaming I => I32, X => I64, etc, but I'm working on it.

Attachment #8758708 - Attachment is obsolete: true

Attachment #8758708 - Flags: review?(luke)

Attachment #8758708 - Flags: review?(bbouvier)

Attachment #8759677 - Flags: review?(luke)

Attachment #8759677 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Comment 52

•

9 years ago

Comment on attachment 8759677 [details] [diff] [review] bug1232205-wasm-baseline-compiler-v3.patch Review of attachment 8759677 [details] [diff] [review]: ----------------------------------------------------------------- Spent a few more hours scanning today, I think it looks good to land. It'll be interesting (later, after x86/ARM/debugger/tiering) to look into the lines you have optimized with TODO/OPTIMIZE since many look quite promising. ::: js/src/asmjs/WasmBaselineCompile.cpp @@ +107,5 @@ > > +#include "jit/MacroAssembler-inl.h" > +#include "jit/shared/CodeGenerator-shared.h" > +#if defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_X64) > +# include "jit/x86-shared/CodeGenerator-x86-shared.h" Are these two CodeGenerator #includes still necessary? I'd hope that we only had a dependency on the MacroAssembler (and anything in the CG we needed would be moved to the MA). @@ +191,5 @@ > + }; > + > + typedef Vector<PooledLabel*, 8, SystemAllocPolicy> LabelVector; > + > + struct UniquePooledLabelFreePolicy { { on newline @@ +315,5 @@ > + // code density and branch prediction friendliness will be less > + // important. > + > + class OutOfLineCode : public TempObject { > + private: 2-space indent for public/private labels (here and below) @@ +504,5 @@ > + // Stack-allocated local slots. > + > + int32_t pushLocal(size_t nbytes) { > + if (nbytes == 8) > + localSize_ = (localSize_ + 7) & ~7; To match other places in baldr, could you write localSize_ = AlignBytes(localSize_, nbytes) here and about 4 places below? @@ +527,5 @@ > +#if defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_X64) > + uint32_t space = GeneralRegisterSet::NonVolatile().size() * sizeof(intptr_t) > + + FloatRegisterSet::NonVolatile().getPushSizeInBytes(); > +#else > + // See definitions of FramePushedAfterSave and NonVolatileRegs in WasmStubs.cpp, Once we're inside Ion/Baseline code (which we enter via GenerateEntry), there are no non-volatile registers. GenerateEntry takes care of saving the (C++) caller's non-volatiles and after that, nothing is preserved by the Ion ABI. So I'd just remove all non-volatile-saving code in baseline, incl saveSize_. @@ +529,5 @@ > + + FloatRegisterSet::NonVolatile().getPushSizeInBytes(); > +#else > + // See definitions of FramePushedAfterSave and NonVolatileRegs in WasmStubs.cpp, > + // as well as SavedNonVolatileRegisters() in RegisterSets.h: Who is responsible > + // for saving the link register, and where does it get saved? If it gets pushed fwiw, lr is saved by PushRetAddr() by GenerateFunctionPrologue as part of the AsmJSFrame. @@ +1013,5 @@ > + void sync() { > + size_t start = 0; > + size_t lim=stk_.length(); > + > + for ( size_t i=lim ; i > 0 ; i-- ) { SM whitespace convention is "for (size_t i = lim; i > 0; i--)". I think quite a few of the loops/ifs do this; could you convert the rest too? @@ +1701,5 @@ > + storeToFrameI(i->gpr(), l.offs); > + break; > + case MIRType::Int64: > + if (i->argInRegister()) > + storeToFrameX(Register64(i->gpr()), l.offs); It would really be nice to see I32 here instead of X, given that nowhere else uses this I/X/D/F convention so it's a bit jarring if you're hopping in and out.

Attachment #8759677 - Flags: review?(luke) → review+

Lars T Hansen [:lth]

Assignee

Comment 53

•

9 years ago

Attached patch bug1232205-infrastructure-addendum.patch — Details — Splinter Review

The sniffing of SIMD usage in the first patch ("infrastructure") is not quite sufficient: SIMD can be used even if it does not show up in the types, operations, or constructors, namely as literals, as in these two cases: function f() { return i16x8(1,2,3,4,5,6,7,8); } function f() { i4(1,2,3,4) } I found these by running all the asm.js tests with --wasm-always-baseline, since some SIMD tests are not yet run automatically.

Attachment #8760145 - Flags: review?(luke)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 54

•

9 years ago

Comment on attachment 8759210 [details] [diff] [review] bug1232205-ool-truncate-v2.patch Review of attachment 8759210 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! ::: js/src/jit/MacroAssembler.cpp @@ +1816,5 @@ > + if (widenFloatToDouble) { > + MOZ_ASSERT(src.isSingle()); > + srcSingle = src; > + src = src.asDouble(); > + push(srcSingle); Really not a big deal, but do we need the srcSingle variable at all? if (widen) { MOZ_ASSERT(src.isSingle()); push(src); convertFloat32ToDouble(src, src.asDouble()); src = src.asDouble(); } And below: if (widen) { MOZ_ASSERT(src.isDouble()); pop(src.asSingle()); } For what it's worth, MIPS32 and MIPS64 (and also at least x64) push the double's content of a FloatRegister, as shows e.g. https://dxr.mozilla.org/mozilla-central/source/js/src/jit/mips32/MacroAssembler-mips32.cpp#739 (push(FloatRegister) calls ma_push(FloatRegister) on ARM-masm-like platforms). Feel free to make a more specific #ifdef as you expressed in the comment (ifdef x64 and x86), it definitely makes sense to me to express a clearer intent here. Probably mips can do as ARM does, as ARM has the same aliasing constraints? (unless we use the upper single of FloatRegister as independent virtual registers) ::: js/src/jit/MacroAssembler.h @@ +1596,5 @@ > void convertTypedOrValueToFloatingPoint(TypedOrValueRegister src, FloatRegister output, > Label* fail, MIRType outputType); > > + void outOfLineTruncateSlow(FloatRegister src, Register dest, bool widenFloatToDouble, > + bool compilingAsmJS); Feel free to rename the last arg as compilingWasm.

Attachment #8759210 - Flags: review?(bbouvier) → review+

Lars T Hansen [:lth]

Assignee

Comment 55

•

9 years ago

Attached patch bug1232205-wasm-baseline-compiler-v4.patch (obsolete) — Details — Splinter Review

Addresses all of Luke's concerns from his review of v3, except for the thing about the control stack which I'll do in a followup bug. No other functional changes from v3. (An interdiff with v3 is likely to be larger than the patch, due to the completely pervasive naming changes.) Carrying Luke's r+.

Attachment #8759677 - Attachment is obsolete: true

Attachment #8759677 - Flags: review?(bbouvier)

Attachment #8760251 - Flags: review?(bbouvier)

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Attachment #8760251 - Flags: review+

Lars T Hansen [:lth]

Assignee

Comment 56

•

9 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #54) > Comment on attachment 8759210 [details] [diff] [review] > bug1232205-ool-truncate-v2.patch > > Review of attachment 8759210 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks! > > ::: js/src/jit/MacroAssembler.cpp > @@ +1816,5 @@ > > + if (widenFloatToDouble) { > > + MOZ_ASSERT(src.isSingle()); > > + srcSingle = src; > > + src = src.asDouble(); > > + push(srcSingle); > > Really not a big deal, but do we need the srcSingle variable at all? Potato-potahto... a matter of taste. I happen to dislike the non-caching style you suggest above, but there's nothing objective about that, just a C programmer grumbling about kids these days. > For what it's worth, MIPS32 and MIPS64 (and also at least x64) push the > double's content of a FloatRegister, as shows e.g. > https://dxr.mozilla.org/mozilla-central/source/js/src/jit/mips32/ > MacroAssembler-mips32.cpp#739 (push(FloatRegister) calls > ma_push(FloatRegister) on ARM-masm-like platforms). Yes. I did minimal cleanup here but I don't know how general the logic *really* needs to be. It's likely this code really should be in platform files. > Feel free to make a more specific #ifdef as you expressed in the comment > (ifdef x64 and x86), it definitely makes sense to me to express a clearer > intent here. OK. > Probably mips can do as ARM does, as ARM has the same aliasing > constraints? (unless we use the upper single of FloatRegister as independent > virtual registers) I will let the MIPS build MOZ_CRASH and let the MIPS people clean it up. cc'ing now :) > ::: js/src/jit/MacroAssembler.h > @@ +1596,5 @@ > > void convertTypedOrValueToFloatingPoint(TypedOrValueRegister src, FloatRegister output, > > Label* fail, MIRType outputType); > > > > + void outOfLineTruncateSlow(FloatRegister src, Register dest, bool widenFloatToDouble, > > + bool compilingAsmJS); > > Feel free to rename the last arg as compilingWasm. Will do so.

Luke Wagner [:luke]

Updated

•

9 years ago

Attachment #8760145 - Flags: review?(luke) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Comment 57

•

9 years ago

Comment on attachment 8759252 [details] [diff] [review] bug1232205-long-to-float-and-back-v2.patch Review of attachment 8759252 [details] [diff] [review]: ----------------------------------------------------------------- Cool, thank you. ::: js/src/jit/x64/MacroAssembler-x64.cpp @@ +103,5 @@ > + j(Assembler::Signed, &isSigned); > + vcvtsq2sd(input, output, output); > + jump(&done); > + > + bind(&isSigned); I think here and below in the float32 equivalent, we've lost the comment that we divide by 2, convert and multiply by two. It's just weird to have a comment for the unsigned case and not for the signed case, so we can either remove both or keep both.

Attachment #8759252 - Flags: review?(bbouvier) → review+

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Blocks: 1278635

Lars T Hansen [:lth]

Assignee

Comment 58

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e93460709f0

Lars T Hansen [:lth]

Assignee

Comment 59

•

9 years ago

Attached patch bug1232205-wasm-baseline-compiler-v4.patch — Details — Splinter Review

Incorporates bug fixes for OOM handling and non-unified builds, and refactors the remainderI32 and remainderI64 operations for simplicity.

Attachment #8761123 - Flags: review?(bbouvier)

Lars T Hansen [:lth]

Assignee

Updated

•

9 years ago

Attachment #8760251 - Attachment is obsolete: true

Attachment #8760251 - Flags: review?(bbouvier)

Lars T Hansen [:lth]

Assignee

Comment 60

•

9 years ago

Note to reviewers, the patches on this bug as attached are from various rebasings, not all of them the same, and they may not apply together to any particular repo version. For a link to a Mercurial bundle containing the most recent coherent set, with a defined base patch in m-i, see the URL field of this bug.

Lars T Hansen [:lth]

Assignee

Comment 61

•

9 years ago

Try run is 100% green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe0b3d7270ba

Lars T Hansen [:lth]

Assignee

Comment 62

•

9 years ago

Note to self: Something broke in the test runner: the wast tests are no longer being run with the baseline compiler despite spec.js containing the correct directive.

Benjamin Bouvier [:bbouvier] (inactive)

Comment 63

•

•

9 years ago

Whiteboard: [games:p1]

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Updated

•

9 years ago

Whiteboard: [games:p1] → [games:p1][platform-rel-Games]

Christian Holler (:decoder)

Updated

•

9 years ago

Flags: needinfo?(choller)

Steve Singer (:stevensn)

Updated

•

9 years ago

Depends on: 1281961

bug1232205-wasm-baseline.bundle 9 years ago Lars T Hansen [:lth] 52.23 KB, application/octet-stream		Details
bug1232205-wasm-baseline.bundle 9 years ago Lars T Hansen [:lth] 36.98 KB, application/octet-stream		Details
bug1232205-wasm-baseline.bundle 9 years ago Lars T Hansen [:lth] 51.73 KB, application/octet-stream		Details
combined.patch 9 years ago Dan Gohman [:sunfish] 234.67 KB, patch		Details \| Diff \| Splinter Review
bug1232205-wasm-baseline.bundle 9 years ago Lars T Hansen [:lth] 37.04 KB, application/octet-stream		Details
bug1232205-wasm-baseline.bundle 9 years ago Lars T Hansen [:lth] 38.47 KB, application/octet-stream		Details
bug1232205-infrastructure.patch 9 years ago Lars T Hansen [:lth] 26.32 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-allsinglemask.patch 9 years ago Lars T Hansen [:lth] 4.09 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-long-to-float-and-back.patch 9 years ago Lars T Hansen [:lth] 11.94 KB, patch		Details \| Diff \| Splinter Review
bug1232205-floating-min-max.patch 9 years ago Lars T Hansen [:lth] 9.89 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-ool-truncate.patch 9 years ago Lars T Hansen [:lth] 13.52 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-test-driver.patch 9 years ago Lars T Hansen [:lth] 3.79 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-test-directives.patch 9 years ago Lars T Hansen [:lth] 9.28 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-wasm-baseline-compiler.patch 9 years ago Lars T Hansen [:lth] 192.20 KB, patch		Details \| Diff \| Splinter Review
bug1232205-saveregs-bugfix.patch 9 years ago Lars T Hansen [:lth] 4.97 KB, patch		Details \| Diff \| Splinter Review
bug1232205-wasm-baseline-compiler-v2.patch 9 years ago Lars T Hansen [:lth] 193.76 KB, patch		Details \| Diff \| Splinter Review
bug1232205-infrastructure-v2.patch 9 years ago Lars T Hansen [:lth] 22.63 KB, patch	luke : review+ lth : review+	Details \| Diff \| Splinter Review
bug1232205-ool-truncate-v2.patch 9 years ago Lars T Hansen [:lth] 14.77 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-long-to-float-and-back-v2.patch 9 years ago Lars T Hansen [:lth] 13.42 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1232205-wasm-baseline-compiler-v3.patch 9 years ago Lars T Hansen [:lth] 191.15 KB, patch	luke : review+	Details \| Diff \| Splinter Review
bug1232205-infrastructure-addendum.patch 9 years ago Lars T Hansen [:lth] 4.15 KB, patch	luke : review+	Details \| Diff \| Splinter Review
bug1232205-wasm-baseline-compiler-v4.patch 9 years ago Lars T Hansen [:lth] 191.11 KB, patch	lth : review+	Details \| Diff \| Splinter Review
bug1232205-wasm-baseline-compiler-v4.patch 9 years ago Lars T Hansen [:lth] 191.86 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review