Closed Bug 1191061 Opened 9 years ago Closed 9 years ago

OdinMonkey OOMs on makethingsnow/minecraft

Categories

(Core :: JavaScript Engine: JIT, defect)

24 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: azakai, Unassigned)

References

()

Details

(Whiteboard: [MemShrink])

makethingsnow.com/minecraft/

works when OdinMonkey is disabled, and on Chrome. On Firefox with OdinMonkey enabled, memory usage jumps by a few GB (far far more than it uses with OdinMonkey off), very quickly, and my machine runs out of memory.
That would probably be the function Sf which is .7MB (1/3 of the file), contains ~2300 local variables, and took about 91s on my machine to compile (which it was able to complete successfully).  Once compilation finished, though, the game ran much smoother (when fully zoomed out) on FF than on Chrome.

The baseline compiler (bug 1169167) should allow us to mitigate the problem.  In addition to producing baseline code fast, the background Ion compilation can monitor LifoAlloc memory usage and, past a threshold, fallback to a baseline compilation of just that function.  This would hurt performance of the final code, though.  It'd be nice to look into optimizations to mitigate the memory usage in these many local slots x huge function cases.
Depends on: 1169167
(In reply to Luke Wagner (PTO) [:luke] from comment #1)
> It'd be nice to look into optimizations to mitigate
> the memory usage in these many local slots x huge function cases.

Yeah, I think it'd be interesting to find out where these GBs go. I'll look into this today, there might be some easy wins that'd also be nice for mobile.
On OS X 64-bit, we have the following LifoAlloc sizes when compiling the big function:

After generating MIR: 4822 MB
After optimizing MIR: 5021 MB
After generating LIR: 5035 MB
After regalloc:       5648 MB
After codegen:        5648 MB

Regalloc uses 600 MB but other than that the backend seems pretty memory-efficient. I'll find out what the initial 4.8 GB is.
(In reply to Jan de Mooij [:jandem] from comment #3)
> I'll find out what the initial 4.8 GB is.

I think most of this is phis... With > 2300 phis and sizeof(MPhi) a bit more than 200 bytes, that's 500 KB per basic block. Not sure but it seems we have 5082 basic blocks, so just the phis take > 2 GB.
That matches what I've seen before when looking into these mega-memory cases.  IIRC, most of these are being introduced for loops (since we pessimistically insert phis for all local slots before entering the loop body, and then only at the end drop useless phis).  What if, at the end of the loop, instead of leaving the useless phis dead, we added them to some free list of phis that was reused?  That is, I'm guessing that 2gb is mostly full of dead phis.
(In reply to Jan de Mooij [:jandem] from comment #4)
> (In reply to Jan de Mooij [:jandem] from comment #3)
> > I'll find out what the initial 4.8 GB is.
> 
> I think most of this is phis... With > 2300 phis and sizeof(MPhi) a bit more
> than 200 bytes, that's 500 KB per basic block. Not sure but it seems we have
> 5082 basic blocks, so just the phis take > 2 GB.

Sorry if this is a silly question, but does that mean that memory usage is numPhis*numBasicBlocks*sizeofPhi? In other words, adding one phi adds memory proportional to the number of basic blocks in the function?
Whiteboard: [MemShrink]
Almost, it's:
 (numLocalVars * numLoops * sizeofPhi) + (numerOfActualPhisNeededForNonLoops * sizeofPhi)
I see, thanks. Then plugging in the numbers, this implies that to reach 2GB we probably need either

1. Around 5,000 loops (to get the first expression to the right range), or
2. Around 10 million numerOfActualPhisNeededForNonLoops (to get the second)

Both seem surprising?
(In reply to Alon Zakai (:azakai) from comment #8)
> 1. Around 5,000 loops (to get the first expression to the right range), or

I double checked and there are indeed 5082 "pending loop header" blocks, out of ~28312 blocks... Looking at the asm.js code, there are a *ton* of loops like this one:

do {
    a[qs >> 0] = a[js >> 0] | 0;
    qs = qs + 1 | 0;
    js = js + 1 | 0
} while ((qs | 0) < (rs | 0));
Wow, thanks. I'll pass that along to the project, maybe they can inline less or something like that.
Is this something to fix on our side, or on their side?
Probably more on their side. Also, the site now works fine on nightly, so they may have already done some optimizing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.