Last Comment Bug 735974 - jsmess games hang and use lots of RAM
: jsmess games hang and use lots of RAM
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal with 2 votes (vote)
: mozilla14
Assigned To: general
:
:
Mentors:
Depends on:
Blocks: WebJSPerf
  Show dependency treegraph
 
Reported: 2012-03-14 20:45 PDT by Alon Zakai (:azakai)
Modified: 2012-12-30 09:56 PST (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
give up on scripts with > 1000 args + locals (858 bytes, patch)
2012-04-13 08:30 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Splinter Review

Description Alon Zakai (:azakai) 2012-03-14 20:45:56 PDT
http://interbutt.com/temp/jsmess_cosmofighter.html
http://interbutt.com/temp/jsmess_smurfs.html

are two URLs from the jsmess project which is compiling MESS (an emulator for old devices) to JS using emscripten.

Both URLs load and use about 200MB in chrome. In nightly they use >1GB and the browser hangs, not rendering any frames.
Comment 1 Alon Zakai (:azakai) 2012-03-15 17:06:47 PDT
David, you marked this as blocking WebJSPerf, I just want to make sure it's clear that this isn't a perf bug in the sense of speed. The page doesn't run at all and just hangs, it isn't that it's too slow (well, I guess it's infinitely slow ;)
Comment 2 David Mandelin [:dmandelin] 2012-03-15 17:08:56 PDT
(In reply to Alon Zakai (:azakai) from comment #1)
> David, you marked this as blocking WebJSPerf, I just want to make sure it's
> clear that this isn't a perf bug in the sense of speed. The page doesn't run
> at all and just hangs, it isn't that it's too slow (well, I guess it's
> infinitely slow ;)

I also added it to my triage list. But thanks for asking, because I had put it down as a perf bug. I changed it to a 'regression' (really user-facing bug) which has a higher priority.
Comment 3 Hubert Figuiere [:hub] 2012-03-25 20:20:26 PDT
I went to the page http://jsmess.textfiles.com and after closing the tab (because it was not doing much visibly) Firefox nightly stopped responding.

It want so far as not saving the state properly, ie that tab was reopen on restore.
Comment 4 Alon Zakai (:azakai) 2012-03-26 18:12:16 PDT
The link Hub just mentioned is also given in this blogpost describing the jsmess project,

http://ascii.textfiles.com/archives/3502

Kind of sad that as described in there only Chrome can run the code.
Comment 5 Alon Zakai (:azakai) 2012-03-26 18:41:25 PDT
A report on that blogpost now says that it works fine in Safari too.
Comment 6 David Anderson [:dvander] 2012-03-27 15:46:53 PDT
I can reproduce this. According to perf, 82% of the time is spent in js::analyze::ScriptAnalysis::checkPendingValue, and 5% in js::analyze::ScriptAnalysis::analyzeLifetimes. So likely this is TI-related?
Comment 7 Brian Hackett (:bhackett) 2012-04-04 08:40:19 PDT
I looked at the smurfs link and the script which is taking so much time to analyze has 2 megabytes of bytecode and 20,000 local variables.  This works fine in Chrome and Safari because they don't even try to generate optimized code here.  This is not fixable by making algorithmic improvements to analyzeSSA, the script is just too large.  It could be fixed by redesigning things so that SSA and inference are chunk-based (in addition to compilation), will think on how to do that without impacting the precision of the types.
Comment 8 Alon Zakai (:azakai) 2012-04-04 10:08:23 PDT
As a temporary workaround to avoid the browser locking up and requiring a forced quit, can we not optimize those scripts, like V8 and JSC?
Comment 9 David Anderson [:dvander] 2012-04-06 11:30:13 PDT
Ugh, this kind of case is deadly to IonMonkey as well (and I think Crankshaft too - which doesn't compile functions with more than 128 locals or so). Even with chunked compilation, IonMonkey's performance is heavily related to the number of locals because of bailouts.

Another idea would be to fall back to normal baseline JM compilation, if that's still possible.
Comment 10 David Mandelin [:dmandelin] 2012-04-12 12:06:26 PDT
(In reply to Brian Hackett (:bhackett) from comment #7)
> I looked at the smurfs link and the script which is taking so much time to
> analyze has 2 megabytes of bytecode and 20,000 local variables.  This works
> fine in Chrome and Safari because they don't even try to generate optimized
> code here.  This is not fixable by making algorithmic improvements to
> analyzeSSA, the script is just too large.  It could be fixed by redesigning
> things so that SSA and inference are chunk-based (in addition to
> compilation), will think on how to do that without impacting the precision
> of the types.

I'd like to get this fixed for the next merge point, which is April 24. Is there any chance of chunked analysis, or do we need to fall back to baseline compilation? Is there a good way to identify scripts like this upfront, or do we need to use a timer?
Comment 11 Brian Hackett (:bhackett) 2012-04-12 13:07:57 PDT
Using baseline compilation as a fallback wouldn't work well, because TI would need to be disabled for the entire compartment.  Another fallback is to just treat locals as escaping once they get beyond a certain threshold (as if they are also accessible via closures).  Type specialized code would still be generated in this case, but types for such locals would be less precise (always includes undefined, doesn't distinguish between different places the same variable is used, though I doubt that's an issue here) and such locals could not be carried in registers --- all accesses are on memory.  This is easy to try, will give it a go.

I'm hesitant about trying chunked analysis because it's not clear that's the right solution.  We started chunking compilation, now considering chunking analysis too, will we end up wanting to chunk the initial bytecode compilation instead?  Would unify all the various things we might consider doing in chunks, and would go along well with lazy bytecode compilation for that matter.
Comment 12 Brian Hackett (:bhackett) 2012-04-13 08:30:15 PDT
Created attachment 614796 [details] [diff] [review]
give up on scripts with > 1000 args + locals

Haven't tested this in an opt browser yet but it allows analysis to complete in a debug one and use the game.  Give up on tracking the values of variables accurately when the # of args + locals exceeds 1000 (the first 1000 will still be tracked).  Temporary bandaid.

I think now that chunked bytecode compilation will be the best long term solution here, it has the fewest pain points for keeping down space/time complexity in the face of truly gigantic scripts and seems cleaner than having separate chunking solutions for each backend pass.  For the compilation issues in IM with scripts that have gigantic numbers of locals (which would still be the case with chunked bytecode sections), analysis could keep track of which locals are ever even mentioned in a chunk and filter that information through the bailout mechanism and other bits of the compiler whose performance is tied to script->nfixed.
Comment 13 Brian Hackett (:bhackett) 2012-04-15 21:39:13 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/67ca169a52d2

Note You need to log in before you can comment on or make changes to this bug.