Try to better handle OOMs in Cranelift
Categories
(Core :: JavaScript: WebAssembly, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox74 | --- | affected |
People
(Reporter: gkw, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
9.11 KB,
text/plain
|
Details |
I have a rust OOM backtrace with a non-reproducible testcase on m-c rev e728bf01a2b6 found by funbind. Command line arguments involved --fuzzing-safe --wasm-compiler=cranelift --no-asmjs --enable-avx --parser-deferred-alloc --blinterp --blinterp-warmup-threshold=1 --more-compartments --nursery-strings=off --ion-offthread-compile=off --no-cgc --gc-zeal=23,473 --no-incremental-gc --baseline-warmup-threshold=1 --no-threads
(probably not all needed) and this was a --disable-profiling
build.
stderr shows memory allocation of 131072 bytes failedRedirecting call to abort() to mozalloc_abort
.
Backtrace:
#0 mozalloc_abort (msg=<optimized out>) at /home/ubuntu/trees/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:33
#1 0x000056107d31111d in abort () at /home/ubuntu/trees/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:82
#2 0x000056107e2829f7 in std::sys::unix::abort_internal () at src/libstd/sys/unix/mod.rs:155
#3 0x000056107e271cad in rust_oom () at src/libstd/alloc.rs:217
#4 0x000056107e295f47 in alloc::alloc::handle_alloc_error () at src/liballoc/alloc.rs:247
#5 0x000056107e196c76 in alloc::raw_vec::RawVec<T,A>::reserve_internal (self=0x7fe3d5158328, used_capacity=<optimized out>, needed_extra_capacity=<optimized out>, fallibility=alloc::raw_vec::Fallibility::Infallible, strategy=<optimized out>) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/raw_vec.rs:699
#6 0x000056107e19c972 in alloc::raw_vec::RawVec<T,A>::reserve (self=0x7fe3d578c703 <_IO_2_1_stderr_+131>, used_capacity=140616515770544, needed_extra_capacity=94628837020416) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/raw_vec.rs:520
#7 0x000056107e1867cd in alloc::vec::Vec<T>::extend_with (self=0x7fe3d5158328, n=1, value=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/vec.rs:1585
#8 0x000056107e18bc26 in alloc::vec::Vec<T>::resize (self=0x7fe3d5158328, new_len=<optimized out>, value=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/vec.rs:1483
#9 0x000056107e1d0896 in <cranelift_entity::map::SecondaryMap<K,V> as core::ops::index::IndexMut<K>>::index_mut (self=0x7fe3d5158328, k=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-entity/src/map.rs:162
#10 cranelift_codegen::ir::layout::Layout::insert_inst (self=0x7fe3d51582f8, inst=..., before=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/ir/layout.rs:584
#11 0x000056107e1c3940 in <&mut cranelift_codegen::cursor::EncCursor as cranelift_codegen::ir::builder::InstInserterBase>::insert_built_inst (self=0x7ffcc74ab110, inst=..., ctrl_typevar=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/cursor.rs:793
#12 0x000056107e1c6af7 in <cranelift_codegen::ir::builder::InsertBuilder<IIB> as cranelift_codegen::ir::builder::InstBuilderBase>::build (self=..., data=..., ctrl_typevar=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/ir/builder.rs:132
#13 0x000056107e21ac4d in cranelift_codegen::ir::builder::InstBuilder::Unary (self=..., opcode=<optimized out>, ctrl_typevar=..., arg0=...) at /home/ubuntu/shell-cache/js-64-profDisabled-linux-x86_64-e728bf01a2b6/objdir-js/x86_64-unknown-linux-gnu/release/build/cranelift-codegen-c0f60a350baac231/out/inst_builder.rs:5110
#14 0x000056107e2160b3 in cranelift_codegen::ir::builder::InstBuilder::fill (self=..., x=...) at /home/ubuntu/shell-cache/js-64-profDisabled-linux-x86_64-e728bf01a2b6/objdir-js/x86_64-unknown-linux-gnu/release/build/cranelift-codegen-c0f60a350baac231/out/inst_builder.rs:1436
/snip
Benjamin mentioned over email to file a bug as a start.
![]() |
Reporter | |
Updated•5 years ago
|
Comment 1•5 years ago
|
||
FWIW, Ion aborts when failing to allocate LIR nodes, so it's not a given that this is a bug. It's weird to run out of memory for almost any test program on desktop systems, but it's not unprecedented, see eg bug 1561127. That's for Ion, of course, but if you read through that bug you'll see Cranelift has a similar problem, only it grows more slowly.
(It's of course possible the present problem is a genuine bug so it should be investigated, and we do need a general strategy for OOM handling.)
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I've discussed this with other people, and apparently Rust doesn't provide a lot of ways to check that a memory allocation failed or not. In this case, this is a Vec allocation failing, and Ryan told me we could use Vec::try_reserve
which will fail if there's not enough memory. We're using a bunch of vectors that get resized a lot in Cranelift (SecondaryMaps
), so it might not be efficient to use try_reserve
every time. A few things we could do:
- limit the size of wasm bytecode that Cranelift can handle. This gives an upper boundary to all the vectors used in Cranelift, knowing that a single wasm opcode might be converted into a small group of CLIR instructions.
- we could try to pre-reserve size for all of these vectors, and fail early if any of these preallocations fail. It's not a silver bullet, because the vector could still need to be bigger than what we preallocated, but it might help (and then we can do slower reallocations).
- there are also forks of the other std lib containers that contain methods to fallibly allocate, e.g. https://github.com/Manishearth/hashglobe for hashmap.
I'll try to look into this, unless somebody beats me to it.
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Closing because no crashes reported for 12 weeks.
Comment 4•5 years ago
|
||
This is not fixed at all, Cranelift can still cause OOMs while allocating internal Rust data-structures.
Comment 5•5 years ago
|
||
Removing crash signature, so that release management doesn't automatically close this bug in the future.
Updated•5 years ago
|
Updated•4 years ago
|
Comment 6•4 years ago
|
||
Julian, can you upstream this to the alliance and then close this one? Thanks.
Comment 8•4 years ago
|
||
Information from Benjamin Bouvier:
-
As far as is known, there is no PR for OOM behaviour in Cranelift
currently. -
At present, OOM in Rust causes a panic.
-
However, that can be changed
https://doc.rust-lang.org/alloc/alloc/fn.handle_alloc_error.html
The default behavior of this function is to print a message to standard error
and abort the process. It can be replaced withset_alloc_error_hook
and
take_alloc_error_hook
. -
Gecko already uses that
https://searchfox.org/mozilla-central/source/toolkit/library/rust/shared/lib.rs#138
-
So maybe this could be fixed purely on the SpiderMonkey side only?
Comment 9•4 years ago
|
||
There is no active work on Cranelift.
Description
•