10.43 KB, patch
|Details | Diff | Splinter Review|
Will attach a testcase that OOMs during compiling asm.js code on my machine. It uses over 5GB then fails.
So far these OOMs have been from gargantuan functions. Is there any way you can print these out from Emscripten and see if there are any here?
I wrote a tool to count, and looks like one function, _yyparse, is 53,665 lines long... What is our thinking about such situations? We don't have a good way to know what the right size is to start cutting up functions in emscripten, and even if we tried then we might cause big slowdowns if we cut in the wrong way. So I am hoping there is something we can do on the other side. At minimum, perhaps abort the compilation and give a validation failure? Btw, why does even 53K lines of code end up taking over 5GB of memory? That's 100K per line.)
(In reply to Alon Zakai (:azakai) from comment #4) > At minimum, > perhaps abort the compilation and give a validation failure? > (because with odin disabled, it does run ok in the normal code path)
(In reply to Alon Zakai (:azakai) from comment #4) Ah, yyparse, that is the output of yacc, right? > So I am hoping there is something we can do on the other side. At minimum, > perhaps abort the compilation and give a validation failure? Yeah, there should be a validation error, I'll check why there isn't. > Btw, why does even 53K lines of code end up taking over 5GB of memory? > That's 100K per line.) (How many locals does _yyparse have?) If the problem is what I think it is, we are allocating a phi for each local variable at various control flow statements (loop headers, branch joins). This happens because we speculatively emit phis for all locals at many points and then use a later pass to through away unnecessary ones. (V8 does the same thing, I thought; general IM/V8 don't have this problem only because _yyparse isn't being compiled.) > What is our thinking about such situations? We've already seen this issue before (_yy_reduce in sqlite, iirc) and are planning to investigate to see if there is any reasonable fixes. Bad phi behavior already motivated Sean to land one good O(n)-to-O(1) fix; maybe there is something else we can do here. However, in general, it is a bad idea for Emscripten to generate >10KLOC functions; it's just asking for something quadratic in some browser's compilation backend. Ideally, I think, Emscripten would (1) warn for such functions, advising the user to change their code, (2) offer an "out-of-line functions list" argument that lets the user list functions to pull out of asm.js into normal JS.
Could be yacc, yeah. Emscripten could warn on very large functions. The question is how much? This will vary between parser to parser and machine to machine. Out of lining functions is quite hard actually in the current architecture, I decided not to pursue that.
Sean, Jan (hehe, http://en.wikipedia.org/wiki/Sean_John): do you think there are any simple fixes for these pathologically large testcases like Alon attached in comment 2? The problem is being exacerbated Odin unconditionally ion-compiling everything, to be sure, but it seems like if we could make phis scale better we might be able to bump our script compilation size limits again.
Doesn't anyone look at Yacc or Bison output these days? I blinded myself when young doing so, then writing my own LALR(1) generator to emit prettier code (my eyes got better ;-). /be
Wow, I looked at this again, _yyparse is *madness*! It contains >11,000 local variables! 53,666 lines! Surely there can't be 11,000 live values; does Emscripten do anything like register allocation to reuse same-typed variables?
Emscripten does not run the register allocator in a -g build (like this one is), because the registerize phase makes the code unreadable/undebuggable. I can try to get a non -g build from the author.
There is an optimized build on the author's demo page, which should have registerize run on it, http://qiezi.me/projects/webruby-build/webruby.js Looks like parsing fails on that, I get "too much recursion" (similar to bug 868884).
Created attachment 747234 [details] [diff] [review] process if-else chains without recursion Independent of the other issues, parsing yyparse throws "over recursed" because of a super-long if/else-if/else-if/... chain. This is a simple fix, just turning the recursion on the 'else' branch into a loop. It also creates far less join blocks (one at the end instead of one for each else-if) which reduces memory usage (still not to acceptable levels, but better).
Created attachment 747235 [details] ruby (updated to validate with new asm.js rules)
With the patch in comment 13 applied, memory climbs up and stays at 4.3GB. Compilation takes 1m18s. I'll dig in a bit further to see which phases we spend all the time and if phis are still all the memory.
Looking at webruby.js, there are still ~4400 local variables in yyparse (now named 'd9'). About 1.4GB of phi nodes are created (1/2 are for backedges, 1/2 are for branch joins), whereas resident stays at 4.3GB, so this isn't all just phi nodes. On the insane compile time, almost all of it is in d9 (yyparse). The coarse breakdown is: total d9 time: 73364ms validate/generate MIR: 2069ms optimize MIR: 6204ms generate LIR and regalloc: 61053ms codegen: 4037ms Thus, the time is mostly regalloc. Trying --ion-regalloc=stupid dropped the regalloc time to 20s. I had to ctrl-C --ion-regalloc=backtracking after 15 minutes (perhaps it's an iloop bug). So this isn't just one problem in Ion. I think the most reliable solution is automatic out-of-lining of large functions by Emscripten.
As discussed offline (summarizing here for everyone else's benefit), out-of-lining will require emscripten to properly implement a separate compilation story. We have plans for that but it won't be in the immediate future. Also there are concerns with using out-of-lining as a solution for this type of problem, since "excessively large function size" will obviously differ between JS engines. Optimally if a JS engine decides a function is too large for it to compile, it would do something else to that function in an automatic way. (Is it not possible given the background thread compilation infrastructure to abort a compilation that runs too long, and re-do it with fewer or no optimizations?) Side note, we got lucky with UE3 not hitting any problems related to this, but as we are seeing more large codebases being compiled like this one, we will need to work to address this stuff both in emscripten and in JS engines I think.
(In reply to Alon Zakai (:azakai) from comment #19) > (Is it not possible given the background thread compilation infrastructure > to abort a compilation that runs too long, and re-do it with fewer or no > optimizations?) Well, what you can see from the data below is that all the phases are taking a ton of time and memory and we don't have sufficient knobs to turn them all down. Pathological functions like this just need to stay in the baseline jit. > Side note, we got lucky with UE3 not hitting any problems related to this, > but as we are seeing more large codebases being compiled like this one, we > will need to work to address this stuff both in emscripten and in JS engines > I think. The only thing we could do in JS that wasn't just dampening the effect is to support baseline-compilation from Odin. The problem is that baseline compilation requires a JSScript and generating the JSScript is a very expensive thing to do (we have to generate one for the entire asm.js module; we saved like 6 seconds on UE3 by avoiding JSScript creation).
(In reply to Luke Wagner [:luke] from comment #20) > > The only thing we could do in JS that wasn't just dampening the effect is to > support baseline-compilation from Odin. The problem is that baseline > compilation requires a JSScript and generating the JSScript is a very > expensive thing to do (we have to generate one for the entire asm.js module; > we saved like 6 seconds on UE3 by avoiding JSScript creation). I was hoping we could just baseline in such cases, but I guess baseline is not feasible here :( Would it be possible to turn the function into an ffi call, so the JSScript is just for it?
(In reply to Alon Zakai (:azakai) from comment #21) > I was hoping we could just baseline in such cases, but I guess baseline is > not feasible here :( Baseline is feasible (and we'd do what you were suggesting, use an FFI call). The hard part is getting a JSScript on demand. On second thought, though, this isn't as bad as I said in comment 20. Before the asm.js parser, we still have the parse tree for the offending function, so it's just a matter of appeasing the emitter; this might even be easy given the lazy parsing work which has a similar problem. After the asm.js parser, though, which apparently is getting close, it'll be harder. But overall, I'm feeling more like this is what we need to do in the medium-term to make Odin industrial strength. I still want Emscripten out-of-lining, though :) The reason is that Odin should be very conservative about falling back to non-AOT and devs may want to be more aggressive about, e.g., out-of-lining everything that is only run during initialization.
Agreed, we should work to solve this on both sides. On emscripten as I said, we will get this together with separate compilation aka modules, likely over this summer either by an intern or me.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1b0bfd72e825 (This patch just makes if/else-if checking iterative, fixing the "over recursed" error; leaving open for the general problem of categorically dealing with over-large functions.)
Actually, one patch per bug, so I'll move work on the general solution to bug 875174.