LICM causes enormous localSlotCount on emterpreter function

NEW
Unassigned

Status

()

P5
normal
4 years ago
2 years ago

People

(Reporter: azakai, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

286.89 KB, application/javascript
Details
3.65 KB, application/octet-stream
Details
(Reporter)

Description

4 years ago
Created attachment 8580799 [details]
a.out.js

The attached testcase runs out of stack in odin, but not when odin is disabled. Luke knows more details.
(Reporter)

Comment 1

4 years ago
Created attachment 8580800 [details]
a.out.js.mem

mem init for that testcase (required by emterpreter)
The emterpreter function only have a handful of vars, but localSlotCount() is more than 10k (in another testcase I was looking at), so it seems like somehow regalloc is consuming an inordinate amount of spill space.
Summary: "too much recursion" in odin on emterpreter testcase → enormous localSlotCount on emterpreter testcase
Flags: needinfo?(bhackett1024)
How should I run this testcase?  I don't get a stack overflow when running 'js a.out.js a.out.js.mem' and the stack space used by the backtracking allocator seems reasonable (max 500 bytes in a frame).
(Reporter)

Comment 4

4 years ago
Just placing the two files in the same dir, and running     js a.out.js     should show the problem (the js loads the .mem file).

I can still see the problem on a new build of mozilla-inbound.
OK, I'm able to reproduce the error on x86 (but not x64).

(In reply to Luke Wagner [:luke] from comment #2)
> The emterpreter function only have a handful of vars, but localSlotCount()
> is more than 10k (in another testcase I was looking at), so it seems like
> somehow regalloc is consuming an inordinate amount of spill space.

I don't know about the other testcase, but in this one on x86 I see regalloc using up 520 bytes for stack slots on the emterpret() function.  Is there one frame of emterpret() for each frame of the bytecode program (or whatever) that is being interpreted?

The fact this works with Odin disabled is accidental.  If I run the x86 shell with --no-asmjs --no-threads --ion-limit-script-size=off I get a too much recursion error as well, and similar (though not quite as high) stack consumption in the emterpret() function.  I guess that with our default behavior we execute enough of the testcase in baseline (while the emterpret() function warms up and is Ion compiled off thread) that we avoid triggering the error.

I don't think the stack consumption in emterpret() is a problem with the register allocator.  emterpret() looks like it is a big switch statement inside a while(true) loop (i.e. an interpreter loop).  I looked at the regalloc spew and added some printfs and see that almost all the stack slots are being used by vregs that are simultaneously live for the entire interpreter loop, and require different stack slots.  These vregs are defined in the interpreter loop's preheader --- they are loop invariant terms that have been hoisted.  If I run the shell with --ion-licm=off then the testcase completes with no error (both with and without Odin) and the Odin version of emterpret() uses only 28 bytes of stack.
Flags: needinfo?(bhackett1024)
Thanks for investigating!  If this testcase only uses 520 bytes of stack space, then perhaps this testcase is not representative of the original problem (which had 20x stack usage).  Alon, do you still have the original?  It'd be good to test with --ion-licm=off on it; LICM sounds like a likely culprit.
(Reporter)

Comment 7

4 years ago
I found the email thread and will forward it (it's a private UE4 build). However, I don't seem to see the problem on it, it appears to fail with other errors now. Either the build in the link was updated, or it isn't a problem anymore.
Ah hah, it is indeed LICM: with --ion-licm=on, paddedLocalSlotsSize = 11232, with --ion-licm=off, paddedLocalSlotsSize = 32.

So it seems like we need some heuristics to limit hoisting in these monster-loop cases.

Updated

4 years ago
Summary: enormous localSlotCount on emterpreter testcase → LICM causes enormous localSlotCount on emterpreter function
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.