Closed Bug 1409124 Opened 7 years ago Closed 2 years ago

Large stack overflow limit differences between wasm baseline and ion

Categories

(Core :: JavaScript: WebAssembly, defect, P5)

defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: azakai, Unassigned)

Details

Attachments

(2 files)

86 bytes, application/octet-stream
Details
819 bytes, application/x-javascript
Details
Attached file w.wasm
This might be working as intended, but it's a noticeable difference between the wasm baseline and ion, orders of magnitude.

Adjusting the global initial value in the testcase will set how many recursions we do before we manually stop, and then the question is did we hit the VM limit before that. If the limit is 10, baseline and ion behave the same (we don't hit the VM limit). At 100 ion hits the VM limit but baseline does not. Same at 1000 and 10,000. Only at 100,000 do we hit the baseline VM limit.

The 100 vs 100,000 difference seems large enough that this might be noticed in practice.
Attached file a.js
One categorical difference is that baseline allocates a stack slot for each local and then a stack slot for the max stack depth.  In Ion, the only stack usage is what the regalloc needs to spill.  So you could easily have a 1,000-local function that takes 4kb per frame on baseline and 8 bytes on ion if those 1,000 locals are not all live at the same time.
(In reply to Alon Zakai (:azakai) from comment #0)
> Created attachment 8918975 [details]
> 
> Adjusting the global initial value in the testcase will set how many
> recursions we do before we manually stop, and then the question is did we
> hit the VM limit before that. If the limit is 10, baseline and ion behave
> the same (we don't hit the VM limit). At 100 ion hits the VM limit but
> baseline does not. Same at 1000 and 10,000. Only at 100,000 do we hit the
> baseline VM limit.

Are the names of those compilers not reversed in the above paragraph (possibly confused by the use of --no-wasm-XXX)?

Looking at the machine code generated by the two compilers, baseline allocates very large stack frames and should hit the limit  quickly.  This is a consequence of the one-pass strategy that assumes the input is not adversarial.

We can imagine ways in which we can improve that if we think it's an important use case.  Earlier I added code to not unroll frame initialization fully for large frames since we could blow the code budget, but analyzing the frame to pack it (say) requires maintaining a mapping from logical offset to actual offset and some kind of patching of the init code, if we're going to maintain the single-pass quality.  It can clearly be done.  We could conceivably only enable it for a large number of locals relative to bytecode size.  It adds complexity.

The more sophisticated packing, tracking live ranges, is definitely out of bounds for baseline.  So even on non-adversarial code (eg, -O0 code) we might run into this discrepancy.
Yeah, sorry, I flipped the compiler names in the text there.

Thanks for the details. Overall though I am still a little worried that we might see applications fail startup in baseline where they would have run ok in ion, and it could be something unpredictable and hard to debug. In a perfect world with OSR it would be cool to be able to stop running baseline when its stack runs out and wait for ion then swap to it, but I guess that's not practical...

Are the stack frames baseline allocates on the literal native stack? Or are they on a data structure on the heap that you control?
The frames are on the literal native stack.

So, if we run into really large frames and we don't want to try to shrink them (because we probably won't be able to, in realistic non-adversarial settings) we can choose to block on ion compilation.  In principle I think we could generate code at the point of frame allocation that calls out to C++ to wait for ion code, and when it returns we can jump to the ion code in the normal manner.  We could conceivably do this adaptively, eg, if more than half the stack has been consumed, or if the function has been invoked recursively, though I think that might be hard to control predictably.

In The Future where we decide to Ion-compile individual functions instead of individual modules it will be more meaningful to wait on Ion for a special function, perhaps.

Having a side stack for very large frames is a possibility but then couldn't we just as well just have a larger stack?  Or are we running into OS limits here that we can't easily work around?  If we had a separate reserved area for large frames it would be fairly simple to push/pop on that, and keep a pointer to that frame in a local, we'd need to be sure to cache that frame pointer.  Probably debugging becomes more complicated too.

I'll give this a little more thought but for the baseline compiler it's definitely beneficial if the front end compiler packs the space of locals as much as it can.
If we see this problem in the wild, we could have a heuristic where baseline was disabled on individual functions that contained too many locals.  But even in Ion, there is a compile-time penalty for using tons of locals so it seems better to rely on the toolchain to optimize locals.  Note that SM, V8, and probably other engines agreed on some fixed impl limits, one of which being the max number of locals in a function, which is 50,000.  Also, we probably won't be the only engine with a baseline for long, so eventually it won't be a FF-only limit either.
Yeah, once other VMs have baselines/interpreters etc I guess the risk here would decrease.

Also possible we won't see this in the wild, but that recent register allocator bug was a program with a massive number of locals ;)
Burying this for now.

(Agreed we may see the problem with a large number of locals in the wild, less sure whether we'll see deep recursions consisting of such large frames.)
OS: Unspecified → All
Priority: -- → P5
Hardware: Unspecified → All
Component: JavaScript Engine: JIT → Javascript: Web Assembly
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: