Closed Bug 777099 Opened 13 years ago Closed 1 year ago

Investigate compiled-JS-specific aggressive IonMonkey compilation

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: ehsan.akhgari, Unassigned)

References

Details

(Whiteboard: [js:t][js:ni])

I was talking about this with Kannan today, and we decided that it might be worth discussing in a bug. Here's my idea: we may be able to use a simple heuristic to decide whether a function has been compiled with Emscripten by just looking at all of the variables used in that function and determining what the types are. Emscripten generated code should only access ints, floats, function objects (to call other functions) and typed arrays, I think. If we can get this to work correctly, then maybe we can start to be very aggressive about compiling those functions with Ion, and there is a good chance that it will help because the type info should be accurate in the first call to the function, so the assumptions that Ion uses to generate code will be true the next time that the function is called. Once we get this to work, we can look optimizing into Emscripten specific patterns (like or's with zero, etc.). I don't know a lot about the JS engine side of things. CCing a bunch of people who might.
I think this is related to bug 767223.
Coincidentally, several of us have been discussing a plan along these lines. By doing a flow-insensitive analysis (which should cover Emscripten code), we could run this analysis on the parse tree and if all the local types check out, we could immediately start jitting the function (on a background thread, even). The resulting jit code should also be rather easily serializable, allowing us to attach it to the appcache. I had planned to experiment with this after bug 767013.
If we're going to be detecting this why don't we flag it explicitly, perhaps using //@ syntax with a version?
(In reply to comment #3) > If we're going to be detecting this why don't we flag it explicitly, perhaps > using //@ syntax with a version? I think we should consider that a last resort. That's a bad thing for the web as far as other similar JS code generators will be concerned, but if we come to the conclusion that we are unable to rely on other heuristics, I wouldn't let idealism stand in the way of pragmatism.
That is a good question and there has been a rather long discussion on this subject. The idea is that if we just make up some new language that just so happens to be embedded in JS (via magic comments, ultra-strict type systems, required use of non-standard APIs), then (1) it'll look like cheating and (2) noone else will adopt it so we won't move the web. If, OTOH, we can have a mechanism that optimizes some reasoned subset of JS, then hopefully 1 and 2 wouldn't apply. E.g., I've seen several blog posts lately that tell webdevs to write monomorphic code and initialize all properties an object will ever get eagerly etc. In this context, the present optimization could be an addendum: "and use operations (like (a+b)|0) on variables that hint to the compiler the type of values that the variable will hold at runtime". It a fine line, I realize, hence the long discussion :)
*It's
I'd be in favor of exposing something like this such that all machine-generated code can take advantage of it. If it's an emscripten-specific heuristic you're not going to do much to help people in the long run since it might break if emscripten's code generation changes enough (or people use another cross-compiler). A tag like "static typed"; at the top of a function body or file could work like "use strict", and as much as people aren't fans of the "use strict" approach, it appears to be something that developers are willing to adopt, and it costs nothing in JS runtimes that don't support it. I have to imagine a tag like that would be much cheaper than doing a heuristic analysis of the entire body of a function. If you do go with a heuristic, the criteria it uses need to be clearly specified so that other code generators can attempt to target them. But then that means the criteria are set in stone and you can't change them...
Yep; that's why I was suggesting a versioned tag, with defined/published semantics. Other engines can choose to implement if they want, especially if it's well defined.
A versioned tag would be even more useful if it spit out an error or warning when the constraints specified by the tag aren't met. That would get you into a place where you know whether your code generator is meeting the requirements you think it is, instead of the present where things deoptimize without warning when you make a mistake (or the JS runtime's constraints change).
I don't like the idea of a magic comment or string telling the engine the code is from a specific compiler and version. If this is done just in Firefox, then it just seems wrong - we shouldn't give special treatment to emscripten-generated code, and emscripten-generated code should work well and similarly in all browsers. If this is done in collaboration with other browsers, then in theory it is not so bad. But this seems like the wrong kind of thing to standardize on - I don't think it makes sense to standardize on code from one compiler, we should seek general principles. Also, I think it is very early to conclude that we need this. There are a lot of ideas of how to the solve the problems here without magic compiler/version comments. (As luke mentioned, we had a very long talk about this, that is my summary of my position.)
(In reply to Kevin Gadd (:kael) from comment #7) > approach, it appears to be something that developers are willing to adopt, > and it costs nothing in JS runtimes that don't support it. I have to imagine > a tag like that would be much cheaper than doing a heuristic analysis of the > entire body of a function. I hoping the analysis wouldn't be very expensive; we should be able to avoid running it at all for most functions by ruling them out when the initial parse sees unsupported parse node kinds (costing only a bit flip on parser paths which will more than amortize the cost). For the set of functions where the analysis does run, there shouldn't be any startup cost (such as building SSA), so the analysis can fail fast when it fails. This is what a good experimental prototype would help us understand. > If you do go with a heuristic, the criteria it uses need to be clearly > specified so that other code generators can attempt to target them. Definitely; this has been a goal from the beginning. > But then that means the criteria are set in stone and you can't change them... Well, we wouldn't want to *stop* optimizing programs, but I fully expect that we'd want to expand the set of programs that can be eagerly compiled over time (e.g., to recognize Java/C#-compiled-to-JS uses of BindaryData when we get BinaryData). I agree with Ehsan that an opt-in magic syntax should be considered a last resort; it seems like there are several resorts left to try :)
I agree with Alon that special messages embedded in comments is not desirable, and if we do go down that path, it should be as a last resort (and done begrudgingly). Adding js-style optimization #pragmas, which is what this would essentially be, is not a good direction to take the language. If we want to fix something about the language to enable this, then we should do that through the proper channels, not custom hinting mechanisms. That said, I honestly doubt that it'd be necessary to go down that path anyway. As Ehsan noted, we can expect Emscripten code to exhibit some clear characteristics: using a small set of well-defined types throughout the program, and being generally type stable. Luke: On an implementation path - I don't know if we even really have to be as fancy as background-jitting the code or anything. Currently with the hybrid JM + Ion approach, we're choosing to method-jit at a useCount of 40, and ion-jit at a useCount of 10k or so. Ion _should_ do very well on emscripten code (and should do even better going down the line). The simplest first stab at this could be to add the emscripten-heuristic check at the 40-useCount mark, and if it looks good, to use Ion immediately for that compile instead of JM. In the rare cases where the heuristic returns a false positive, the Ion code will invalidate pretty quickly and we can just set a flag on the script to return false on all subsequent runs of the heuristic. That will ensure that it gets treated like "regular code" from that point on. Over time, we can expand the heuristic to cover more cases of type-stable and numerically intensive code that would benefit from an aggressive ion compilation strategy.
(In reply to Kannan Vijayan [:djvj] from comment #12) \> On an implementation path - I don't know if we even really have to be as > fancy as background-jitting the code or anything. On a particular example I've spent some time with (unfortunately not public, but I can give you the code), we have large initial pauses (10-20sec) with slow stuttery start for the first 10 seconds. Compilation time in IM should be at least 2x worse than JM. Fortunately not all this time is in pure jit compilation, but other things that eager compilation should avoid (interpreter time, monitoring time, TI analysis time). To remove the stuttery start, we want to compile everything (that has good local type info) before running the first frame, but that will make initial start time even worse. This is why I think we ultimately need parallel IM compilation (for first start) and jit-code-in-appcache (for second+ start). Of course, we don't have to do these things *initially*, but I do think we need to do them.
Wow, 10-20 secs of jaggyness is bad. I wouldn't have thought it would be that far gone. The jitcode in appcache is a good idea and would probably be not too hard to implement. The IM compile-on-the-side might be more... tricky. This is presumably true of JM as well, but Ion reaches into runtime objects routinely (objects, prototypes, typesets & typeobjs, shapes, etc.) during compilation. Guarding these against races from a parallel runtime, I expect, will be somewhat delicate work (to put it lightly).
(In reply to Kannan Vijayan [:djvj] from comment #14) > The IM compile-on-the-side might be more... tricky. This is presumably true > of JM as well, but Ion reaches into runtime objects routinely (objects, > prototypes, typesets & typeobjs, shapes, etc.) during compilation. From talking to dvander, all that should happen at the beginning of the pipeline, during IonBuilder (bytecode -> MIR). If we have eagerly-typed JS, we should be able to go straight to MIR w/o even looking at TI. There would be a small amount of TI necessary (constraints on certain global properties, etc), but these could be taken care of at per-compilation (not per-function) level and thus not in parallel. (Or at least, this is the hope.)
Bug 774253 allows IM to compile off thread concurrently with main thread VM stuff. This is geared towards reducing IM compilation cost. It works, but I'm still sorting through perf to make it actually advantageous to do this. Bug 767223 allows JM compilation and analysis work to happen concurrently with each other, but not with the VM. This is geared towards reducing stutter. It also works, but the prospects for landing it anytime soon are grim. There are costs to eager compilation besides just the increased risk of later invalidation. Large translated programs are going to contain a lot of cold code (*NOT* dead code), and compiling cold code eagerly will chew up a huge amount of time and memory no matter how you cut things. This was a big problem after JM originally landed. Any viable approach to eager compilation needs to account for cold code; it won't do to just ignore reality and pretend this problem doesn't actually exist.
(In reply to Brian Hackett (:bhackett) from comment #16) > Any viable approach to eager > compilation needs to account for cold code I definitely agree that cold code is a risk here. How about this approach: 1. Tabs that use a lot of CPU get a lot of eager compilation. Eager compilation is done concurrently (glad to see bug 774253 for that). This guarantees that eventually CPU-intensive games will be fully optimized, and sites with a lot of code that are not CPU intensive do not. This just leaves non-smooth startup as an issue. 2. Tabs that had a lot of compilation on them will have their compilation saved in the appcache, guaranteeing that second run will be smooth. This just leaves non-smooth startup on first run as an issue. 3. Websites that we compile a lot phone home anonymously, and we gather statistics on which websites have bad first run. We compile and analyze those on our servers. When the browser starts to load a site, if the JS is large enough we anonymously ping our servers, possibly getting back a response that says whether we should precompile a lot before starting, as well as perhaps a list of which functions should be precompiled (so we can ignore cold ones), and perhaps other hints at how to compile. Using that, we can at least wait a bit to precompile just enough to make first run not stuttery.
Summary: Investigate Emscripten specific aggressive IonMonkey compilation → Investigate compiled-JS-specific aggressive IonMonkey compilation
(In reply to Alon Zakai (:azakai) from comment #17) > This just leaves non-smooth startup on first run as an issue. > > 3. Websites that we compile a lot phone home anonymously, and we gather > statistics on which websites have bad first run. We compile and analyze > those on our servers. When the browser starts to load a site, if the JS is > large enough we anonymously ping our servers, possibly getting back a > response that says whether we should precompile a lot before starting, as > well as perhaps a list of which functions should be precompiled (so we can > ignore cold ones), and perhaps other hints at how to compile. Using that, we > can at least wait a bit to precompile just enough to make first run not > stuttery. This is not a bad idea, but it's not feasable -- it's a privacy violation (we'd know what web sites users are browsing), and it's also not really possible for us to do this analysis on every web page that needs it (or even on most). Having compilation saved in appcache should be enough combined with tabs using a lot of cpu getting a lot of eager compilation -- that'll make first run a little non-smooth, but it should smooth out fairly quickly (as we'll keep doing more and more eager compilation). To avoid the memory hit, we could write the results to disk/appacache aggressively, and then map that code back in and let the OS deal with the memory management.
If cold code needing to be jitted causes poor startup, why not ask JS game developers to run 1-2 frames 'invisibly' to make the code warm so it gets jitted? Maybe expose a way for them to say 'please jit everthing I do between these method calls'?
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #18) > > Having compilation saved in appcache should be enough combined with tabs > using a lot of cpu getting a lot of eager compilation -- that'll make first > run a little non-smooth, but it should smooth out fairly quickly (as we'll > keep doing more and more eager compilation). I agree 100%. I only mentioned point 3 to say that we do have long-term options to solve even the problem of first run smoothness. I do think point 3 is feasible, though: For anonymity, one possible method could be to hash the JS on the client and just send that (securely using https), and of course to not retain any identifying information on our servers. We already have good experience with proper anonymous treatment of user data on large scales with Sync. For the issue of limited server CPU resources, yes, we would need to be careful and only spend resources on the most popular sites and in an amount that our servers can handle. But imagine AAA games: There are only a few dozen launched each year, surely we could optimize for those and it would be a big win to do so. > To avoid the memory hit, we > could write the results to disk/appacache aggressively, and then map that > code back in and let the OS deal with the memory management. Sounds good.
In the specific case of Emscripten-compiled code, we can easily construct a heuristic call graph that identifies, probabilistically, which code will run early and which code will only run later. Emscripten-generated code makes only named monomorphic function calls, and the callees be named functions (with the names matching). By starting at the root script and scanning the bytecode, one can obtain a potential set of call edges to other scripts. Iterating on that, it should be easy to build a tree of calls. Within this tree of calls, subtrees that originate from within loops in their caller(s) can be tagged as "likely to be hot". After that, we can order all the scripts, taking the likely-to-be-hot scripts first, then the scripts with low depth, and then scripts with high depth. We can impose a cutoff on that ordered list, and then pre-compile just that subset of scripts. This approach is likely to yield a good heuristic, and involves no special action on the part of the developer (which I think is something we should aim for). All that said, I was just speaking with Ehsan and I think that when he created this bug he had something a lot more modest in mind (as compared to ahead-of-time jitting and all the considerations that go along with that). On the desktop, he indicated that Emscripten code (e.g. game code) ran "acceptably" with reasonable frame rates, but would still be improved by simply being more aggressive about invoking ion compilation, but otherwise nothing too radical. I suspect that even this small change of correctly identifying this type of code and choosing Ion over JM would yield significant boost in the situations he is working with. On mobile, he indicated that Emscripten code is just really really slow, and games run at 1fps. I suspect that this issue has more to do with the quality of our codegen for ARM than with anything else. Here, too, simply finding the right times to be more aggressive about compiling with Ion might lead to some good payoffs. There are some clear culprits for slowdown on ARM. Proper handling of integer overflows, for example (which in emscripten generated code will just be truncated to 32 bits again) would be a biggie. Again, these sorts of optimizations are more easily implemented on Ion going down the line.
(In reply to comment #21) > I was just speaking with Ehsan and I think that when he created this bug he had > something a lot more modest in mind (as compared to ahead-of-time jitting and > all the considerations that go along with that). > > On the desktop, he indicated that Emscripten code (e.g. game code) ran > "acceptably" with reasonable frame rates, but would still be improved by simply > being more aggressive about invoking ion compilation, but otherwise nothing too > radical. I suspect that even this small change of correctly identifying this > type of code and choosing Ion over JM would yield significant boost in the > situations he is working with. > > On mobile, he indicated that Emscripten code is just really really slow, and > games run at 1fps. I suspect that this issue has more to do with the quality > of our codegen for ARM than with anything else. Here, too, simply finding the > right times to be more aggressive about compiling with Ion might lead to some > good payoffs. > > There are some clear culprits for slowdown on ARM. Proper handling of integer > overflows, for example (which in emscripten generated code will just be > truncated to 32 bits again) would be a biggie. Again, these sorts of > optimizations are more easily implemented on Ion going down the line. Yeah. I think even though focusing on things like ahead of time compilation etc. could be useful in the future, I'd like us to start a lot more conservatively for now. My goal in filing this bug was to see if we can eliminate the JM compilation and just directly compile to Ion if a function is called ~40 times and its previous type state matches the rough set of criteria from comment 0. Once we see that would work in practice and would not slow us down because of compilation overhead etc, then we can hopefully explore improving the heuristics, and doing optimizations which focus on Emscripten (or whatever other compilers we care about)-specific patterns in parallel. Does this make sense?
(In reply to Ehsan Akhgari [:ehsan] from comment #22) > My goal in filing this bug was to see if we can > eliminate the JM compilation and just directly compile to Ion if a function > is called ~40 times and its previous type state matches the rough set of > criteria from comment 0. Once we see that would work in practice and would > not slow us down because of compilation overhead etc, then we can hopefully > explore improving the heuristics, and doing optimizations which focus on > Emscripten (or whatever other compilers we care about)-specific patterns in > parallel. > > Does this make sense? It sounds good to me, but I think it would totally depend on doing Ion compilation in a background thread (bug 774253), otherwise compilation freezes will kill us.
Can I suggest that we do those in parallel, and put it behind a pref until bug 774253 is complete? That way we can take the freeze hit but evaluate ion perf and the heuristics without waiting..
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #24) > Can I suggest that we do those in parallel, and put it behind a pref until > bug 774253 is complete? That way we can take the freeze hit but evaluate > ion perf and the heuristics without waiting.. I'm currently working on bug 774253 and should have it in reviewable condition by tomorrow, so I don't think much waiting will be needed.
(In reply to comment #25) > (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #24) > > Can I suggest that we do those in parallel, and put it behind a pref until > > bug 774253 is complete? That way we can take the freeze hit but evaluate > > ion perf and the heuristics without waiting.. > > I'm currently working on bug 774253 and should have it in reviewable condition > by tomorrow, so I don't think much waiting will be needed. That's awesome!
Whiteboard: [js:t][js:ni]
How things are now with BC and asm.js?
Blocks: JSIL
Assignee: general → nobody
Severity: normal → S3

emscripted-compiled should target WASM these days.
Closing this bug.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.