Closed Bug 746225 Opened 13 years ago Closed 12 years ago

IonMonkey: Chunked compilation

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ion:t])

Attachments

(1 file, 2 obsolete files)

At the moment IonMonkey performs poorly on very large scripts. Compilation time hurts Kraken audio-beat-detection, many scripts generated by Emscripten etc.

The main reasons are 1) compilation does not scale very well (at least not linearly, large scripts have more locals/uses/vregs/liveness intervals) and 2) large scripts tend to hit many undiscovered/new branches and this can trigger a lot of recompilations.

Like JM, we want Ion to compile large scripts in multiple chunks. It will avoid invalidating the whole script in many cases, make compilation faster and avoid (re)compiling large blocks of (cold) code.

I have a prototype which passes most jit-tests (with a very small chunk size). It still needs a lot of work: it's very hackish, does not yet support tableswitch and generates suboptimal code.
Blocks: 748146
Depends on: 749226
Depends on: 749663
Depends on: 750795
Attached patch WIP v1 (obsolete) — Splinter Review
Mostly works but needs x64/ARM support and cleanup.
Attached patch WIP v2 (obsolete) — Splinter Review
Much better than the previous patch; need to cleanup a few files before I move to x64/ARM support.
Attachment #622815 - Attachment is obsolete: true
How's this going? This seems hard. :-D I have some meta-questions:

- Do we have a solid characterization of the problem to be solved?
- Do we have good measurements about how many recompilations are being caused, how long they are taking, and what phases make them take too long? Do we have numbers on the relative scale of reasons 1 and 2 from comment 0?
- Did we document how Crankshaft solves this problem? Are there any other historical JITs we can look at?

and some questions about the chunked compilation approach itself:

- What are the primary concepts behind chunked compilation? 
  - I thought about it myself for a bit, and it seemed to me that it might be clean to make a chunk as much like a script as possible, and make the transition to a different chunk look like a tail call. That way we'd also be halfway to tail calls :-), which we will almost certainly want someday.

- What CFG regions are valid chunks?
  - Single-entry, single-exit regions seemed best to me.

- How do you actually select the chunks?

- Are the answers to the above questions clearly reflected in the code structure and documents?

- How deeply does chunked compilation modify the existing compiler? How much risk is there to taking this if we end up going with some alternate solution later on?
(In reply to David Mandelin from comment #3)
> - Do we have a solid characterization of the problem to be solved?
> - Do we have good measurements about how many recompilations are being
> caused, how long they are taking, and what phases make them take too long?
> Do we have numbers on the relative scale of reasons 1 and 2 from comment 0?

The main problem is that compilation time does not scale well to super large scripts. This means that even a single compilation of a function with 5000 lines is going to be slow. With TI, we can get accurate type information but it does require many more recompilations and I don't think it's easy to avoid these recompilations.

I don't have exact measurements right now, but I can look into that next week. However, we've seen an emscripten-compiled program, sqlite.js, take at least an hour with Ion (~20 seconds is usual I think) and most of this is compilation time. There are ways to speed up compilation, but it won't be enough: JM+TI had the same problem even though it has a very fast compiler and scales better.

> - Did we document how Crankshaft solves this problem? Are there any other
> historical JITs we can look at?

Crankshaft (and DFG) avoid this problem by not compiling large scripts and using a fast baseline compiler. As far as I know, the plan is for us to eventually get rid of JM, but if we decide to keep it as a baseline compiler (or write a new baseline compiler) we could do something similar. The main problem with this is that we will only be able to do advanced compiler optimizations like LICM or GVN on relatively small scripts.

I think the most important questions to answer are:

1) Do we want to keep a baseline compiler like JM?
2) Do we want to use Ion for large scripts, or is it okay to use the baseline compiler?

> and some questions about the chunked compilation approach itself:
> 
> - What are the primary concepts behind chunked compilation? 
>   - I thought about it myself for a bit, and it seemed to me that it might
> be clean to make a chunk as much like a script as possible, and make the
> transition to a different chunk look like a tail call. That way we'd also be
> halfway to tail calls :-), which we will almost certainly want someday.
> 
> - What CFG regions are valid chunks?
>   - Single-entry, single-exit regions seemed best to me.
> 
> - How do you actually select the chunks?

The way it works right now is that the bytecode emitter emits STARTCHUNK ops when we want to start a new chunk. An advantage of this is that chunk boundaries are always between statements and never in the middle of an expression.

The emitter always inserts chunk boundaries before a loop header and after a loop backedge (so small loops have their own chunk to avoid stack stores/loads in the middle of the loop and still allow loop optimizations like LICM), after X number of instructions and in some other cases (to avoid IonBuilder complexity).

I considered single-entry, single-exit but, especially with (labeled) break/continue, a risk is that you get many tiny chunks and therefore a lot of compilation/syncing overhead. LICM is also a lot harder because many loops will consist of multiple chunks. Furthermore, handling multiple entries/exits is not much more complex than handling a single entry/exit since most of the infrastructure is the same.

> - Are the answers to the above questions clearly reflected in the code
> structure and documents?

I'm still cleaning up the code, but yeah I'd like to add some (large) comments describing how the whole process works.

> - How deeply does chunked compilation modify the existing compiler? How much
> risk is there to taking this if we end up going with some alternate solution
> later on?

The main changes are:

1) If there are multiple chunks, every local has its own (reserved) stack slot, to simplify branching to other chunks - this is easy to undo since it affects only a handful of places (and is not used if there is only one chunk).

2) Instead of a single IonScript, we now have one IonScript with multiple IonChunk's. A large part of the patch is renaming IonScript to IonChunk in various places. It wouldn't be too hard to merge the two structures later on and rename everything back to IonScript (remove IonChunk and fix all compiler warnings).

3) IonBuilder has a few methods to handle incoming/outgoing jumps. Most of the work there is removing these methods and some lines from their callers.

I hope this answers your questions, let me know if I forgot something.
(In reply to Jan de Mooij (:jandem) from comment #4)
> (In reply to David Mandelin from comment #3)
> > - Do we have a solid characterization of the problem to be solved?
> > - Do we have good measurements about how many recompilations are being
> > caused, how long they are taking, and what phases make them take too long?
> > Do we have numbers on the relative scale of reasons 1 and 2 from comment 0?
> 
> The main problem is that compilation time does not scale well to super large
> scripts. This means that even a single compilation of a function with 5000
> lines is going to be slow. 

I'm curious about the scaling factors. Is it linear in the number of ops, or superlinear? And I've heard that regalloc is that part that really doesn't scale--is that superlinear? I've also heard it relates to the size of snapshots, so does that make it something like (#ops x #vars)?

Btw, I recently heard about LLVM's new register allocation, and supposedly it scales much better than linear-scan-based algorithms, so if we implemented that, it might take a lot of the pressure off, so that we either don't need chunked compilation, or get a lot more freedom about how big a chunk can be.

On the variables, could something with live variables or some other special handling allow us to shrink the snapshots?

> With TI, we can get accurate type information but
> it does require many more recompilations and I don't think it's easy to
> avoid these recompilations.

AIUI, one known problem is this kind of code:

 f() {
  // loop1
  for (...)
    body1;

  // loop2
  for (...)
    body2;
 }

so loop1 gets hot, but loop2 is unseen, so we are compiling |f| but have to type profiling data for loop2. Are all the known problems around sequential loops, or do we get into trouble with nested loops too? Any other scenarios?

> I don't have exact measurements right now, but I can look into that next
> week. However, we've seen an emscripten-compiled program, sqlite.js, take at
> least an hour with Ion (~20 seconds is usual I think) and most of this is
> compilation time. There are ways to speed up compilation, but it won't be
> enough: JM+TI had the same problem even though it has a very fast compiler
> and scales better.

Interesting. I would also be curious about how much of it is from too much cost for the first compilation and how much from too many recompilations.

> > - Did we document how Crankshaft solves this problem? Are there any other
> > historical JITs we can look at?
> 
> Crankshaft (and DFG) avoid this problem by not compiling large scripts and
> using a fast baseline compiler. As far as I know, the plan is for us to
> eventually get rid of JM, but if we decide to keep it as a baseline compiler
> (or write a new baseline compiler) we could do something similar. The main
> problem with this is that we will only be able to do advanced compiler
> optimizations like LICM or GVN on relatively small scripts.

Makes sense. I'm not sure how far we can go near-term, but this seems like a pretty interesting area for the medium-long term. Baseline compiler might be the easiest way to get us landed for now, while we investigate better approaches. I agree we want to try to get advanced optimizations running on big code. One possible future is to do that on a background thread, but I don't think that's an easy short-term solution or anything.

> I think the most important questions to answer are:
> 
> 1) Do we want to keep a baseline compiler like JM?
> 2) Do we want to use Ion for large scripts, or is it okay to use the
> baseline compiler?

What kinds of experiments can we do to get at the answers to those questions?

> > and some questions about the chunked compilation approach itself:
> > 
> > - What are the primary concepts behind chunked compilation? 
> >   - I thought about it myself for a bit, and it seemed to me that it might
> > be clean to make a chunk as much like a script as possible, and make the
> > transition to a different chunk look like a tail call. That way we'd also be
> > halfway to tail calls :-), which we will almost certainly want someday.
> > 
> > - What CFG regions are valid chunks?
> >   - Single-entry, single-exit regions seemed best to me.
> > 
> > - How do you actually select the chunks?
> 
> The way it works right now is that the bytecode emitter emits STARTCHUNK ops
> when we want to start a new chunk. An advantage of this is that chunk
> boundaries are always between statements and never in the middle of an
> expression.

Hmmm. Definitely seems right to avoid boundaries in the middle of an expression.

Do you mean that STARTCHUNK indicates where we *can* start a new chunk (i.e., we may cut it only into a few chunks) , or where we *will* start a new chunk (i.e., we will always cut as many chunks as there are STARTCHUNK items)?

> The emitter always inserts chunk boundaries before a loop header and after a
> loop backedge (so small loops have their own chunk to avoid stack
> stores/loads in the middle of the loop and still allow loop optimizations
> like LICM), after X number of instructions and in some other cases (to avoid
> IonBuilder complexity).
> 
> I considered single-entry, single-exit but, especially with (labeled)
> break/continue, a risk is that you get many tiny chunks and therefore a lot
> of compilation/syncing overhead. 

Clarifying, I didn't mean that every single-exit, single-entry region should be a chunk, just that it seemed easiest if chunk boundaries are placed only at those points. I haven't thought too much about break/continue, which I can imagine complicate things--examples would help.

The idea about loops seems good. In fact, it makes me think that it would be great to understand the source of the badness, and then use that to understand where the best cutpoints would be.

I see what you're saying about long loops and needing to cut them in the middle somewhere. That seems a bit unfortunate but perhaps necessary. I still like the idea of allowing chunks to nest, so that you could have chunks like this:

 function f() {
  // start chunk A0
  for (...) {
    // start chunk A1
    ...
    // end chunk A1
    // start chunk A2
    ...
    // end chunk A2
  }
  // end chunk A0
 }

although now that I write it down, I see that you'd need to be careful about designing the loop nesting so that it's not doing too many unnecessary transitions. I guess if you put the loop test itself in one of the chunks as some kind of exit and then made them like tail calls of each other, you could get something pretty good.

> LICM is also a lot harder because many
> loops will consist of multiple chunks. Furthermore, handling multiple
> entries/exits is not much more complex than handling a single entry/exit
> since most of the infrastructure is the same.

Ah-so that's not really much of the complexity?

> > - Are the answers to the above questions clearly reflected in the code
> > structure and documents?
> 
> I'm still cleaning up the code, but yeah I'd like to add some (large)
> comments describing how the whole process works.
> 
> > - How deeply does chunked compilation modify the existing compiler? How much
> > risk is there to taking this if we end up going with some alternate solution
> > later on?
> 
> The main changes are:
> 
> 1) If there are multiple chunks, every local has its own (reserved) stack
> slot, to simplify branching to other chunks - this is easy to undo since it
> affects only a handful of places (and is not used if there is only one
> chunk).
> 
> 2) Instead of a single IonScript, we now have one IonScript with multiple
> IonChunk's. A large part of the patch is renaming IonScript to IonChunk in
> various places. It wouldn't be too hard to merge the two structures later on
> and rename everything back to IonScript (remove IonChunk and fix all
> compiler warnings).
> 
> 3) IonBuilder has a few methods to handle incoming/outgoing jumps. Most of
> the work there is removing these methods and some lines from their callers.
> 
> I hope this answers your questions, let me know if I forgot something.
(In reply to David Mandelin from comment #5)
> I'm curious about the scaling factors. Is it linear in the number of ops, or
> superlinear? And I've heard that regalloc is that part that really doesn't
> scale--is that superlinear? I've also heard it relates to the size of
> snapshots, so does that make it something like (#ops x #vars)?
> 
> Btw, I recently heard about LLVM's new register allocation, and supposedly
> it scales much better than linear-scan-based algorithms, so if we
> implemented that, it might take a lot of the pressure off, so that we either
> don't need chunked compilation, or get a lot more freedom about how big a
> chunk can be.
> 
> On the variables, could something with live variables or some other special
> handling allow us to shrink the snapshots?

It doesn't really matter whether the compiler is linear or superlinear.  Even if compilation time is linear, the number of recompilations for a script will be linear in the size of the script (all else being equal), so you already have quadratic behavior and will die on any large, complicated scripts.  Compilation time in JM+TI is roughly linear in the size of the compiled script, and chunked compilation was still necessary to get acceptable performance on large scripts.
We decided to postpone chunked compilation for now. Instead we will fallback to JM+TI for very large scripts for now (bug 755010). The patches here are close, but it's a large change and I think it's better to land this when Ion is more stable.

(In reply to David Mandelin from comment #5)
> I'm curious about the scaling factors. Is it linear in the number of ops, or
> superlinear? And I've heard that regalloc is that part that really doesn't
> scale--is that superlinear? I've also heard it relates to the size of
> snapshots, so does that make it something like (#ops x #vars)?
> 
> Btw, I recently heard about LLVM's new register allocation, and supposedly
> it scales much better than linear-scan-based algorithms, so if we
> implemented that, it might take a lot of the pressure off, so that we either
> don't need chunked compilation, or get a lot more freedom about how big a
> chunk can be.

Interesting, we could also revive the greedy register allocator for such cases, but note that regalloc is not the only problem, GVN and even codegen also show up in profiles.

> On the variables, could something with live variables or some other special
> handling allow us to shrink the snapshots?

Yeah we should consider optimizing this in the compiler. We could also limit the # of locals we track, JM+TI tracks the first 1000 (IIRC?) and all other locals live on the call object.

My chunked compilation patch gives every local its own reserved/fixed stack slot if there are multiple chunks. The next step could be not storing them in the snapshot, and ideally we could do something like a memcpy from the C stack to the StackFrame slots.

> 
> AIUI, one known problem is this kind of code:
> 
>  f() {
>   // loop1
>   for (...)
>     body1;
> 
>   // loop2
>   for (...)
>     body2;
>  }
> 
> so loop1 gets hot, but loop2 is unseen, so we are compiling |f| but have to
> type profiling data for loop2. Are all the known problems around sequential
> loops, or do we get into trouble with nested loops too? Any other scenarios?

In general, this can happen any time we reach new code. Some other (common) scenarios:

1) Nested loop:

// loop1
for (..) {
  // loop2
  for (..) {
  }
  // No type information for the code here if we loop2 is hot.
}

2) if-else statements and switch-statements where we hit a branch for the first time after compiling the function.

> > 
> > 1) Do we want to keep a baseline compiler like JM?
> > 2) Do we want to use Ion for large scripts, or is it okay to use the
> > baseline compiler?
> 
> What kinds of experiments can we do to get at the answers to those questions?

Not sure, but we should at least compare IM vs JM+TI compilation times for small, medium and large functions. If there is a large difference, even for small functions, we will probably need a baseline compiler. Emscripten et al generate tons of code and compilation time is important, even with chunked compilation.

> Do you mean that STARTCHUNK indicates where we *can* start a new chunk
> (i.e., we may cut it only into a few chunks) , or where we *will* start a
> new chunk (i.e., we will always cut as many chunks as there are STARTCHUNK
> items)?

There are two cases:

1) If the script's bytecode length (script->length) is smaller than some constant, we use a single chunk and ignore all STARTCHUNk ops. It this case the overhead of chunk transitions etc is not worth it.

2) Otherwise, every STARTCHUNK op will start a new chunk.

> I haven't thought too much about break/continue, which I
> can imagine complicate things--examples would help.

Emscripten often generates loops like this:

b: for (;;) {
  // ...
  for (;;) {
     if (..)
       break b;
  }
  // ...
}

If chunks are restricted to single-entry/single-exit, we have to use multiple chunks for the inner loop (the |break b;| would be a second chunk exit).

> I see what you're saying about long loops and needing to cut them in the
> middle somewhere. That seems a bit unfortunate but perhaps necessary. I
> still like the idea of allowing chunks to nest, so that you could have
> chunks like this:

Yeah, nesting is a great idea; it may reduce the number of cross-chunk edges. Maybe we should prototype some of these algorithms and compare the resulting chunks..
Attached patch WIPSplinter Review
Updated patch.
Attachment #623224 - Attachment is obsolete: true
(In reply to Brian Hackett (:bhackett) from comment #6)
> (In reply to David Mandelin from comment #5)
> > I'm curious about the scaling factors. Is it linear in the number of ops, or
> > superlinear? And I've heard that regalloc is that part that really doesn't
> > scale--is that superlinear? I've also heard it relates to the size of
> > snapshots, so does that make it something like (#ops x #vars)?
> > 
> > Btw, I recently heard about LLVM's new register allocation, and supposedly
> > it scales much better than linear-scan-based algorithms, so if we
> > implemented that, it might take a lot of the pressure off, so that we either
> > don't need chunked compilation, or get a lot more freedom about how big a
> > chunk can be.
> > 
> > On the variables, could something with live variables or some other special
> > handling allow us to shrink the snapshots?
> 
> It doesn't really matter whether the compiler is linear or superlinear. 
> Even if compilation time is linear, the number of recompilations for a
> script will be linear in the size of the script (all else being equal), so
> you already have quadratic behavior and will die on any large, complicated
> scripts.  Compilation time in JM+TI is roughly linear in the size of the
> compiled script, and chunked compilation was still necessary to get
> acceptable performance on large scripts.

Thanks--very clear explanation. So getting better compiler scaling, whether it's constant factors or asymptotic, most likely won't solve the problem.

Another thing I don't understand as well as I wish I did is how important having a type profile is to getting started. For a function with a sequence of loops, I can see that if you do need a type profile, then you must need chunked compilation. Is there any way to do without a type profile? Or, maybe another way to put it is, if you tried to compile something anyway without a type profile, what's the best thing you can do.

That also made me think a bit more about Jan's example of a not-taken branch:

  if (...) {
    // then branch
    ...
  } else {
    // else branch
    // * not taken => no type profile =>? can't compile at first
  }

It seems very natural to regard the else branch as a chunk, and in fact to just not compile it at all, but put an exit there. Once it is taken (or there is a type profile, or whatever allows us to do something else), decide whether to (a) compile that chunk independently and patch it in or (b) recompile the function with that branch, presumably based on some kind of cost/benefit model. 

Also on that example, does it make any sense to delay compilation longer than default when the type profile is incomplete for the function in a case like that?
(In reply to Jan de Mooij (:jandem) from comment #7)
> We decided to postpone chunked compilation for now. Instead we will fallback
> to JM+TI for very large scripts for now (bug 755010). The patches here are
> close, but it's a large change and I think it's better to land this when Ion
> is more stable.

OK. 

> (In reply to David Mandelin from comment #5)
> > I'm curious about the scaling factors. Is it linear in the number of ops, or
> > superlinear? And I've heard that regalloc is that part that really doesn't
> > scale--is that superlinear? I've also heard it relates to the size of
> > snapshots, so does that make it something like (#ops x #vars)?
> > 
> > Btw, I recently heard about LLVM's new register allocation, and supposedly
> > it scales much better than linear-scan-based algorithms, so if we
> > implemented that, it might take a lot of the pressure off, so that we either
> > don't need chunked compilation, or get a lot more freedom about how big a
> > chunk can be.
> 
> Interesting, we could also revive the greedy register allocator for such
> cases, but note that regalloc is not the only problem, GVN and even codegen
> also show up in profiles.

OK. Brian's point about quadraticness addresses this too. I'm still curious to see the numbers for various benchmarks. They may not be necessary for designing chunked compilation but they might be relevant to regalloc options.

> > On the variables, could something with live variables or some other special
> > handling allow us to shrink the snapshots?
> 
> Yeah we should consider optimizing this in the compiler. We could also limit
> the # of locals we track, JM+TI tracks the first 1000 (IIRC?) and all other
> locals live on the call object.

In existing workloads, do we get 1000+ locals only for Emscripten programs, or for others too? Luke and I were asking ourselves if Emscripten vs. general purpose JS programs are sufficiently different that we just should have two different optimization modes.

> My chunked compilation patch gives every local its own reserved/fixed stack
> slot if there are multiple chunks. The next step could be not storing them
> in the snapshot, and ideally we could do something like a memcpy from the C
> stack to the StackFrame slots.

That would be interesting. I think previous projects have also used compressed snapshots with some success.

> > AIUI, one known problem is this kind of code:
> > 
> >  f() {
> >   // loop1
> >   for (...)
> >     body1;
> > 
> >   // loop2
> >   for (...)
> >     body2;
> >  }
> > 
> > so loop1 gets hot, but loop2 is unseen, so we are compiling |f| but have to
> > type profiling data for loop2. Are all the known problems around sequential
> > loops, or do we get into trouble with nested loops too? Any other scenarios?
> 
> In general, this can happen any time we reach new code. Some other (common)
> scenarios:
> 
> 1) Nested loop:
> 
> // loop1
> for (..) {
>   // loop2
>   for (..) {
>   }
>   // No type information for the code here if we loop2 is hot.
> }

OK. I responded to that a little in my previous comment. Briefly, I was wondering if we should just section off all parts of the function that don't have a type profile, and not compile them at all at first, and then compile each region as we get a type profile (possibly also recompiling the whole thing if it makes sense).

> 2) if-else statements and switch-statements where we hit a branch for the
> first time after compiling the function.
> 
> > > 
> > > 1) Do we want to keep a baseline compiler like JM?
> > > 2) Do we want to use Ion for large scripts, or is it okay to use the
> > > baseline compiler?
> > 
> > What kinds of experiments can we do to get at the answers to those questions?
> 
> Not sure, but we should at least compare IM vs JM+TI compilation times for
> small, medium and large functions. If there is a large difference, even for
> small functions, we will probably need a baseline compiler. Emscripten et al
> generate tons of code and compilation time is important, even with chunked
> compilation.
> 
> > Do you mean that STARTCHUNK indicates where we *can* start a new chunk
> > (i.e., we may cut it only into a few chunks) , or where we *will* start a
> > new chunk (i.e., we will always cut as many chunks as there are STARTCHUNK
> > items)?
> 
> There are two cases:
> 
> 1) If the script's bytecode length (script->length) is smaller than some
> constant, we use a single chunk and ignore all STARTCHUNk ops. It this case
> the overhead of chunk transitions etc is not worth it.
> 
> 2) Otherwise, every STARTCHUNK op will start a new chunk.

Hmmm. I would tend to expect that the generated code will have better perf with bigger chunks, so it seems like it should be tunable.

> > I haven't thought too much about break/continue, which I
> > can imagine complicate things--examples would help.
> 
> Emscripten often generates loops like this:
> 
> b: for (;;) {
>   // ...
>   for (;;) {
>      if (..)
>        break b;
>   }
>   // ...
> }
> 
> If chunks are restricted to single-entry/single-exit, we have to use
> multiple chunks for the inner loop (the |break b;| would be a second chunk
> exit).

Nice example--I think I get it now. It looks like single-entry multiple-exit would work--is that correct?

> > I see what you're saying about long loops and needing to cut them in the
> > middle somewhere. That seems a bit unfortunate but perhaps necessary. I
> > still like the idea of allowing chunks to nest, so that you could have
> > chunks like this:
> 
> Yeah, nesting is a great idea; it may reduce the number of cross-chunk
> edges. Maybe we should prototype some of these algorithms and compare the
> resulting chunks..

That would be cool.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
May you please state the rationale behind WONTFIXing this? The last info in this bug says "do this later when Ion is more stable".
Presumably because Baseline Compiler and lazy bytecode generation make this unnecessary.
Florian: it's very complicated to implement, and a few things have obviated the need so far, as Ryan said the baseline compiler but also off-main-thread compilation for IonMonkey, and asm.js.
Thanks for the explanation, that makes sense!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: