The existing JM2 architecture suffices as a baseline compiler, but further optimization and specialization work is difficult, extremely hacky, or impossible.
There are two problems: the lack of an IR and the design of the compiler itself. The first pass computes stack depths, and the second pass splats out a little template of assembly per-opcode. Operations aren't broken down into small enough units to perform typical optimizations, making cross-basic-block regalloc & LICM rather ad-hoc or limited.
The plan is to design a new set of IRs to assist in making new optimizations possible. We should also be able to break down existing ops into smaller units, for example, we should be able to CSE redundant shape guards or slots-array loads. On top of the IRs we will have a new code generator that re-uses the Nitro assembler. Some planned features:
* Advanced linear scan regalloc across basic blocks
* Interval analyses
* Advanced specialization, via profiling and type inference
* Better handling of boxing formats, especially on x64
* More robust recompilation, elimination of compartment-wide debug mode
A more hopeful, but perhaps not realistic goal is to make this new compiler capable of being both an optimizing, type-specializing compiler, and a generic baseline compiler to replace JM. I think we'll know pretty early on whether this is feasible (i.e. whether it can be done without ruining the design or compile times), and if not, we can also focus on simplifying JM.
> A more hopeful, but perhaps not realistic goal is to make this new compiler
> capable of being both an optimizing, type-specializing compiler, and a generic
> baseline compiler to replace JM. I think we'll know pretty early on whether
> this is feasible (i.e. whether it can be done without ruining the design or
> compile times), and if not, we can also focus on simplifying JM.
Let's assume the latter. What does the execution pipeline look like?
- Interpret a bit
- JaegerMonkey-compile (with type inference) slightly hot code
- IonMonkey-compile (with type inference and IRs) hotter code
- Trace-compile really hot code?
It sounds like TraceMonkey's days are numbered, which isn't necessarily a bad thing. With type inference in place everywhere the only compelling advantage it provides is inlining of small functions.
The type inference branch currently inlines functions, Ion will too. To be honest I'm not sure what the pipeline will look like yet, and the pipeline may change as we figure out how to tune it.
But, one of the up-front design decisions I'd like to make is that Ion code will never call into the tracer. It's an insanely complex path... and reducing the JIT transition matrix is probably best for everyone's sanity. *Especially* Bill's :)
My hope is that Ion won't just be fixing problems in JM, but also TM. TM suffers from an inability to despecialize both types and control flow, as it can't separate compilation from execution. Its IR is also fairly inflexible after it's been emitted.
These are things we will fix in Ion. And for when we don't have type inference, we'll have the ability to collect type information via profiling. One backend, able to feed from multiple sources of information, should get us both JM3 and TM2. Maybe not right away, but it sounds like a good long-term goal.
(In reply to comment #2)
> But, one of the up-front design decisions I'd like to make is that Ion code
> will never call into the tracer. It's an insanely complex path...
Getting rid of the tracer would make that easier. Otherwise the base compiler would have to decide whether to use IonMonkey or TM to do optimizing recompilation, which seems weird.
My gut feeling is that two levels of compilation (base + optimizing) should be enough, though bhackett suggested elsewhere that some kind of trace compilation might be good for white-hot loops. Even if that were the case, it still sounds like a death knell for TM and particularly nanojit; having two assemblers is weird, the momentum is clearly not with TM/nanojit, and nanojit has enough design limitations that redesigning a new trace jit from scratch based on what we've learnt about tracing sounds more sensible.
We don't take perf regressions, so TM won't die without faster replacement on many benchmarks (not just the Stupids(tm)).
I suspect bhackett is right and we'll want tracing, semi-static type inference, and baseline/profiled JITting. But it sounds like dvander et al. have a good plan to support all three.
(In reply to comment #4)
> We don't take perf regressions, so TM won't die without faster replacement on
> many benchmarks (not just the Stupids(tm)).
> I suspect bhackett is right and we'll want tracing, semi-static type inference,
> and baseline/profiled JITting. But it sounds like dvander et al. have a good
> plan to support all three.
We should list the benchmarks whose performance we care about and put them on AWFY. Not just suites, but also microbenchmarks testing individual features important for the web (typed arrays, native getters/setters, ...) and shell tests synthesized from JS bound web pages (bug 643666). (Also, in a new page, like the test breakdown page, to avoid clutter.) I filed this a couple weeks ago as bug 649487.
JM+TI won't replace TM, but IonMonkey could, and I think should, replace both TM and JM. IMO, such replacements can't happen without a concrete way to do a broad evaluation of a JS engine's performance, and I agree the main benchmark suites don't cut it by themselves.
Right now, K9O wants bugs that are absolutely required and can be finished quickly. IM will take a bit longer and is more in the "wanted" category, so not blocking K9O for now.
To avoid a Dupe I tried to find the "Compiler Optimization" / "Be 'nice' While Compiling" Bug, but this Meta was the most recent (and it is fairly old).
This Page http://glsl.heroku.com/e#12543.1 simply hangs the Browser for several minutes (I have a slow Computer) and then the compiled Code runs really fast.
All the other Examples on Page http://glsl.heroku.com/ that I tried load and compile almost instantly.
It would be great if the Compiler were faster on the first example. It is a "Bug" that when compiling the Browser hangs and we can not switch Tabs.
The compiler needs to be 'nice' and not hog the CPU. While we wait we should be able to do other things. This Bug is titled "Build a new optimizing Compiler", this is something that should be considered in its design.
When running an Example on http://threejs.org/ (specifically http://www.playmapscube.com/ ) the compiled output crashed. A List of these example pages would make a good Test / Benchamrk Farm.
(In reply to Rob from comment #7)
> All the other Examples on Page http://glsl.heroku.com/ that I tried load and
> compile almost instantly.
A great example of how slow we compile is here: https://www.shadertoy.com/browse .
1. Copy that URL.
2. Paste it into a new Tab's URL Bar.
3. Let the first example load and then start Google Chrome.
4. Paste the URL into Google Chrome's URL Bar.
Notice that even though Firefox got a head start, and is using most of the CPU, that Chrome whips through that Page much faster. We are unacceptably slow.