Explosive Ion memory use on page using WebAssembly
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
People
(Reporter: keno, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
3.13 MB,
application/octet-stream
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Steps to reproduce:
Navigate to a page embedding the (WIP, prototype) port of Julia to WebAssembly. For your convenience, I've hosted one here: https://keno.github.io/julia-wasm/website/repl.htm.
Actual results:
Firefox consumes all available system memory and becomes unresponsive after a few seconds. No excess memory use is recorded in the memory profiler in the developer tools.
Expected results:
Memory use should remain stable (esp while no WebAssembly code is running).
Updated•5 years ago
|
Comment 1•5 years ago
|
||
I think the issue may be with the Julia repl here: in Safari, the above link similarly increases to 10gb before Safari kills and restarts the tab. In Chrome, the page errors out with an over-recursion error. Most of the time, when it errors out, the memory stays at .5gb, but one time I saw Chrome also increase to 15gb before I killed it. about:memory shows a bunch of vmem, so I'm guessing the app is repeatedly allocating memories without allowing them to be GC'd?
Quite possibly, this stuff isn't particularly stable yet and appears to depend on compilation options. I guess at the very least I would have expected the memory to show up in the dev tools.
Comment 3•5 years ago
|
||
Right, about:memory is the usual go-to tool for this sort of question, but in this case I just see all the memory in "heap-unclassified" which isn't very helpful. So that'd be good to fix.
I'm still not entirely sure I understand what's going on here unfortunately, for two reasons:
- The memory growth continues even if everything is paused in the debugger, so I wouldn't have expected any actual allocations to be possible on the app side
- I instrumented all the places where it touches wasm.Memory with console.log and I see those being called the expected number of times. It's possible that I missed something of course, but it's still odd.
Is there an easy way to take a (native) backtrace, to figure out what in the browser is sitting in the allocation while this memory growth is occurring?
As for Safari, at least on my machine it doesn't seem to finish wasm instantiation at all, so I suspect that may be a different disease with the same symptoms.
Running this on a machine with more memory, this isn't actually a memory leak. Memory consumption goes up to 44GB and then comes right back down to about 300MB. Is it possible that some accounting is missing that prevents the GC from realizing that some of the WebAssembly objects are large and thus it ends up not freeing them until some large collection later (no idea how the firefox GC works - just extrapolating from my knowledge of GC'ed systems)?
I did some more debugging and looks like all the memory allocations is by helper threads perform wasm compilation using IonMonkey, so I don't think this is related to anything the script itself is doing, but rather memory usage during compilation of the wasm file.
Comment 7•5 years ago
|
||
Ah, interesting. Thanks for looking into that; that could explain why there is so much memory in heap-unclassified; the temporary compilation LifoAllocs probably aren't instrumented by about:memory.
I expect there is something real to fix here on the FF side to reduce memory usage during compilation that we should look into. In the meantime, since this issue seems to affect all browser engines, there's probably something about the code being compiled that triggers pathological memory usage. In my experience, the usual culprit is large functions with many loops and many local variables (which tends to produce quadratic creation of phi nodes in SSA-based compilers). As soon as I can get some time (on vacation for a week), I'll instrument a browser and try to get more details.
Chrome is actually fine here on the compilation side. It has some problems with stack usage that I'm looking into, but is otherwise fairly usable. Firefox is also usable if I use the baseline JIT. Safari never worked for this code at all, so I wasn't too worried about it for the moment (until I get this more stable on Firefox/Chrome).
Comment 9•5 years ago
|
||
Absolutely reproducible; if I disable Ion for wasm compilation the page is fine, but with Ion enabled it consumes immense amounts of memory.
Comment 10•5 years ago
|
||
Top function sizes in the .wasm by bytecode count:
Size: 1016787
Size: 644204
Size: 292801
Size: 222272
Size: 207220
Size: 191896
Size: 188876
Size: 165871
Size: 163908
Size: 151668
Size: 151645
Size: 145077
Size: 138221
Size: 132069
Size: 130413
Size: 121995
Size: 121042
Size: 119190
Size: 118717
Size: 110356
Size: 108934
Size: 107413
Size: 106743
Size: 103145
Size: 102043
Size: 101313
Comment 11•5 years ago
•
|
||
perf says most of the time (I'm eyeballing > 75%) is being spent in the register allocator, with 33% in resolveControlFlow alone.
There's a four-deep loop in that function, starting around line 2029 (one of the loops is hidden inside rangeFor()), that's probably the problem.
The performance profile doesn't really speak to the memory growth, of course, but this loop introduces moves and it looks like it introduces one for each iteration of the level-3 loop.
Comment 12•5 years ago
|
||
Once the four-deep loop exits, the program falls into the five-deep loop right after it.
Comment 13•5 years ago
|
||
Letting it run longer, MBasicBlock::addPredecessorPopN starts to dominate and annotation shows eg Vector::infallibleGrowUninitialized.
$ perf record ~/m-i/js/src/build-release/dist/bin/js --wasm-compiler=ion --no-threads hello.js
where hello.js just loads hello.wasm from the TC and calls new WebAssembly.Module on it.
Comment 14•5 years ago
|
||
Repeating the same experiment with Cranelift: after running for a half hour (Cranelift is very slow) we're finally into swap, 26GB resident in the js process.
And then the process aborts with the message "memory allocation of 17,179,869,184 bytes failed" (those are my commas).
We should investigate this a little further, but the evidence is that this is not an Ion bug but an artifact of constructing SSA form on a very large function, and it's possible that we need some kind of ceiling on the function size or on the intermediate form size to avoid this type of pathological result.
Comment 15•5 years ago
|
||
Where did the Cranelift allocation fail, out of curiosity?
If this is during register allocation too, would it make sense to fall back to using a very dumb register allocator that can operate in linear memory, whenever we observe that a given function exceeds a (SSA names number / loop depth / instruction number / something) threshold?
Comment 16•5 years ago
|
||
I have no more information at this time. This was in a release build. The urgent issue is really that Ion fails, not that Cranelift fails, but yes, some kind of limit or fallback is desirable. A fallback to the baseline compiler is clearly possible and may be good enough.
Comment 17•5 years ago
|
||
Crashes in liveness analysis in this case.
#4 0x000055555673ac47 in alloc::alloc::handle_alloc_error () at src/liballoc/alloc.rs:227
#5 0x0000555556670001 in alloc::raw_vec::RawVec<T,A>::reserve_internal (self=0x7ffff7859410, used_cap=<optimized out>, needed_extra_cap=<optimized out>, fallibility=<optimized out>, strategy=<optimized out>) at /rustc/3c235d5600393dfe6c36eeed34042efad8d4f26e/src/liballoc/raw_vec.rs:672
#6 0x0000555556675222 in alloc::raw_vec::RawVec<T,A>::reserve (self=0x0, used_cap=140737349973795, needed_extra_cap=140737349978288) at /rustc/3c235d5600393dfe6c36eeed34042efad8d4f26e/src/liballoc/raw_vec.rs:491
#7 0x0000555556665ed5 in alloc::vec::Vec<T>::push (self=0x7ffff7859410, value=...) at /rustc/3c235d5600393dfe6c36eeed34042efad8d4f26e/src/liballoc/vec.rs:1023
#8 0x0000555556614083 in cranelift_entity::primary::PrimaryMap<K,V>::push (self=0x7ffff7859410, v=...) at cranelift-entity/src/primary.rs:120
#9 0x000055555661c172 in cranelift_bforest::pool::NodePool<F>::alloc_node (self=0x7ffff7859410, data=...) at cranelift-bforest/src/pool.rs:47
#10 0x0000555556615469 in cranelift_bforest::map::MapCursor<K,V,C>::insert (self=0x7fffffff9668, key=..., value=...) at cranelift-bforest/src/map.rs:351
#11 0x00005555566ba670 in cranelift_codegen::regalloc::liverange::GenLiveRange<PO>::extend_in_ebb (self=<optimized out>, ebb=..., to=..., order=0x7ffff78591f8, forest=0x7ffff7859410) at cranelift-codegen/src/regalloc/liverange.rs:298
#12 0x00005555566b9cb9 in cranelift_codegen::regalloc::liveness::extend_to_use (lr=0x7ff246e622e0, ebb=..., to=..., worklist=0x7ffff7859430, func=<optimized out>, cfg=0x7ffff7859318, forest=0x7ffff7859410) at cranelift-codegen/src/regalloc/liveness.rs:259
#13 0x00005555566ba32e in cranelift_codegen::regalloc::liveness::Liveness::compute (self=0x7ffff78593d8, isa=..., func=<optimized out>, cfg=0x7ffff7859318) at cranelift-codegen/src/regalloc/liveness.rs:434
#14 0x00005555566bf325 in cranelift_codegen::regalloc::context::Context::run (self=0x7ffff78593d8, isa=..., func=0x7ffff7859020, cfg=0x7ffff7859318, domtree=0x7ffff7859380) at cranelift-codegen/src/regalloc/context.rs:96
#15 0x00005555566ad81a in cranelift_codegen::context::Context::regalloc (self=0x0, isa=...) at cranelift-codegen/src/context.rs:309
#16 0x00005555566acf7a in cranelift_codegen::context::Context::compile (self=0x7ffff7859020, isa=...) at cranelift-codegen/src/context.rs:146
#17 0x00005555564f0b5b in baldrdash::compile::BatchCompiler::compile (self=<optimized out>) at js/src/wasm/cranelift/src/compile.rs:112
#18 0x00005555564efe61 in cranelift_compile_function (compiler=<optimized out>, data=<optimized out>, result=0x7fffffff9d80) at js/src/wasm/cranelift/src/lib.rs:97
#19 0x00005555563f4187 in js::wasm::CraneliftCompileFunctions (env=..., lifo=..., inputs=..., code=0x7fffe9da8b10, error=0x7fffffffb6b8) at js/src/wasm/WasmCraneliftCompile.cpp:413
#20 0x00005555564391be in ExecuteCompileTask (task=0x7fffe9da8800, error=0x0) at js/src/wasm/WasmGenerator.cpp:728
Comment 18•5 years ago
|
||
A little logging in Ion shows that there's at the very least a block that has 5645 predecessors, and memory use grows explosively while we are building the intermediate form for the function containing that. That's the function with over 1MB bytecode. We've compiled large functions before this, but it's the first function we've encountered with a node that has more than 1000 predecessors. We're in addPredecessorPopN now - it looks like we go quadratic here. But even getting to 1000 grows memory up to about 13GB resident, so I think there's additional overhead elsewhere. Really need to do some proper memory profiling.
Reporter | ||
Comment 19•5 years ago
|
||
FWIW, after some investigation into why this .wasm was so pathological, we found that we'd accidentally forgotten to run binaryen's bysyncify with optimizations enabled, so it generated pretty bad code. With optimizations, the same wasm file works ok. There's probably still something to be addressed in firefox here, but just wanted to mention how this particular wasm file came to be.
Comment 20•5 years ago
|
||
Thanks, that's useful to know, and takes some pressure off. Still, Firefox should do something sensible even when presented with the less-optimal input, as it almost amounts to a DOS attack :)
Reporter | ||
Comment 21•5 years ago
|
||
I'm about to switch out the deployed version of the wasm file for one that doesn't trigger this issue. I'm attaching an (LZMA-compressed) version of the original .wasm file here.
Comment 22•5 years ago
|
||
Thank you!
Comment 23•5 years ago
|
||
It seems plausible that if we use dense data structures (vectors) to represent sparse data (live ranges) and we allocate the vectors from non-freeing arenas (pools) using the usual doubling size, and we have one of these vectors per node, say, and there are many live ranges and many nodes, then memory usage for the pool could grow completely out of hand fairly easily. If we use a less-efficient allocation scheme than doubling, like adding a fixed number of elements per grow, then it could be worse still.
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Seems like a problem that Ion might want to guard against, somehow? Also, I assume this is not really wasm-specific.
Comment 25•5 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #24)
Seems like a problem that Ion might want to guard against, somehow? Also, I assume this is not really wasm-specific.
It's probably wasm-specific because we have limits on script size, number of locals, etc for JS compilation...
It would be good to figure out where we actually spend most time (and memory allocation). Lars mentioned resolveControlFlow but that does a bunch of different things so it would help to know which loop is the main culprit.
Updated•3 years ago
|
Updated•2 years ago
|
Description
•