are two URLs from the jsmess project which is compiling MESS (an emulator for old devices) to JS using emscripten.
Both URLs load and use about 200MB in chrome. In nightly they use >1GB and the browser hangs, not rendering any frames.
David, you marked this as blocking WebJSPerf, I just want to make sure it's clear that this isn't a perf bug in the sense of speed. The page doesn't run at all and just hangs, it isn't that it's too slow (well, I guess it's infinitely slow ;)
(In reply to Alon Zakai (:azakai) from comment #1)
> David, you marked this as blocking WebJSPerf, I just want to make sure it's
> clear that this isn't a perf bug in the sense of speed. The page doesn't run
> at all and just hangs, it isn't that it's too slow (well, I guess it's
> infinitely slow ;)
I also added it to my triage list. But thanks for asking, because I had put it down as a perf bug. I changed it to a 'regression' (really user-facing bug) which has a higher priority.
I went to the page http://jsmess.textfiles.com and after closing the tab (because it was not doing much visibly) Firefox nightly stopped responding.
It want so far as not saving the state properly, ie that tab was reopen on restore.
The link Hub just mentioned is also given in this blogpost describing the jsmess project,
Kind of sad that as described in there only Chrome can run the code.
A report on that blogpost now says that it works fine in Safari too.
I can reproduce this. According to perf, 82% of the time is spent in js::analyze::ScriptAnalysis::checkPendingValue, and 5% in js::analyze::ScriptAnalysis::analyzeLifetimes. So likely this is TI-related?
I looked at the smurfs link and the script which is taking so much time to analyze has 2 megabytes of bytecode and 20,000 local variables. This works fine in Chrome and Safari because they don't even try to generate optimized code here. This is not fixable by making algorithmic improvements to analyzeSSA, the script is just too large. It could be fixed by redesigning things so that SSA and inference are chunk-based (in addition to compilation), will think on how to do that without impacting the precision of the types.
As a temporary workaround to avoid the browser locking up and requiring a forced quit, can we not optimize those scripts, like V8 and JSC?
Ugh, this kind of case is deadly to IonMonkey as well (and I think Crankshaft too - which doesn't compile functions with more than 128 locals or so). Even with chunked compilation, IonMonkey's performance is heavily related to the number of locals because of bailouts.
Another idea would be to fall back to normal baseline JM compilation, if that's still possible.
(In reply to Brian Hackett (:bhackett) from comment #7)
> I looked at the smurfs link and the script which is taking so much time to
> analyze has 2 megabytes of bytecode and 20,000 local variables. This works
> fine in Chrome and Safari because they don't even try to generate optimized
> code here. This is not fixable by making algorithmic improvements to
> analyzeSSA, the script is just too large. It could be fixed by redesigning
> things so that SSA and inference are chunk-based (in addition to
> compilation), will think on how to do that without impacting the precision
> of the types.
I'd like to get this fixed for the next merge point, which is April 24. Is there any chance of chunked analysis, or do we need to fall back to baseline compilation? Is there a good way to identify scripts like this upfront, or do we need to use a timer?
Using baseline compilation as a fallback wouldn't work well, because TI would need to be disabled for the entire compartment. Another fallback is to just treat locals as escaping once they get beyond a certain threshold (as if they are also accessible via closures). Type specialized code would still be generated in this case, but types for such locals would be less precise (always includes undefined, doesn't distinguish between different places the same variable is used, though I doubt that's an issue here) and such locals could not be carried in registers --- all accesses are on memory. This is easy to try, will give it a go.
I'm hesitant about trying chunked analysis because it's not clear that's the right solution. We started chunking compilation, now considering chunking analysis too, will we end up wanting to chunk the initial bytecode compilation instead? Would unify all the various things we might consider doing in chunks, and would go along well with lazy bytecode compilation for that matter.
Created attachment 614796 [details] [diff] [review]
give up on scripts with > 1000 args + locals
Haven't tested this in an opt browser yet but it allows analysis to complete in a debug one and use the game. Give up on tracking the values of variables accurately when the # of args + locals exceeds 1000 (the first 1000 will still be tracked). Temporary bandaid.
I think now that chunked bytecode compilation will be the best long term solution here, it has the fewest pain points for keeping down space/time complexity in the face of truly gigantic scripts and seems cleaner than having separate chunking solutions for each backend pass. For the compilation issues in IM with scripts that have gigantic numbers of locals (which would still be the case with chunked bytecode sections), analysis could keep track of which locals are ever even mentioned in a chunk and filter that information through the bailout mechanism and other bits of the compiler whose performance is tied to script->nfixed.