Closed Bug 1609308 Opened 5 years ago Closed 4 years ago

Try to better handle OOMs in Cranelift

Categories

(Core :: JavaScript: WebAssembly, defect, P5)

defect

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox74 --- affected

People

(Reporter: gkw, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file stack

I have a rust OOM backtrace with a non-reproducible testcase on m-c rev e728bf01a2b6 found by funbind. Command line arguments involved --fuzzing-safe --wasm-compiler=cranelift --no-asmjs --enable-avx --parser-deferred-alloc --blinterp --blinterp-warmup-threshold=1 --more-compartments --nursery-strings=off --ion-offthread-compile=off --no-cgc --gc-zeal=23,473 --no-incremental-gc --baseline-warmup-threshold=1 --no-threads (probably not all needed) and this was a --disable-profiling build.

stderr shows memory allocation of 131072 bytes failedRedirecting call to abort() to mozalloc_abort.

Backtrace:

#0  mozalloc_abort (msg=<optimized out>) at /home/ubuntu/trees/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:33
#1  0x000056107d31111d in abort () at /home/ubuntu/trees/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:82
#2  0x000056107e2829f7 in std::sys::unix::abort_internal () at src/libstd/sys/unix/mod.rs:155
#3  0x000056107e271cad in rust_oom () at src/libstd/alloc.rs:217
#4  0x000056107e295f47 in alloc::alloc::handle_alloc_error () at src/liballoc/alloc.rs:247
#5  0x000056107e196c76 in alloc::raw_vec::RawVec<T,A>::reserve_internal (self=0x7fe3d5158328, used_capacity=<optimized out>, needed_extra_capacity=<optimized out>, fallibility=alloc::raw_vec::Fallibility::Infallible, strategy=<optimized out>) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/raw_vec.rs:699
#6  0x000056107e19c972 in alloc::raw_vec::RawVec<T,A>::reserve (self=0x7fe3d578c703 <_IO_2_1_stderr_+131>, used_capacity=140616515770544, needed_extra_capacity=94628837020416) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/raw_vec.rs:520
#7  0x000056107e1867cd in alloc::vec::Vec<T>::extend_with (self=0x7fe3d5158328, n=1, value=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/vec.rs:1585
#8  0x000056107e18bc26 in alloc::vec::Vec<T>::resize (self=0x7fe3d5158328, new_len=<optimized out>, value=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/liballoc/vec.rs:1483
#9  0x000056107e1d0896 in <cranelift_entity::map::SecondaryMap<K,V> as core::ops::index::IndexMut<K>>::index_mut (self=0x7fe3d5158328, k=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-entity/src/map.rs:162
#10 cranelift_codegen::ir::layout::Layout::insert_inst (self=0x7fe3d51582f8, inst=..., before=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/ir/layout.rs:584
#11 0x000056107e1c3940 in <&mut cranelift_codegen::cursor::EncCursor as cranelift_codegen::ir::builder::InstInserterBase>::insert_built_inst (self=0x7ffcc74ab110, inst=..., ctrl_typevar=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/cursor.rs:793
#12 0x000056107e1c6af7 in <cranelift_codegen::ir::builder::InsertBuilder<IIB> as cranelift_codegen::ir::builder::InstBuilderBase>::build (self=..., data=..., ctrl_typevar=...) at /home/ubuntu/trees/mozilla-central/third_party/rust/cranelift-codegen/src/ir/builder.rs:132
#13 0x000056107e21ac4d in cranelift_codegen::ir::builder::InstBuilder::Unary (self=..., opcode=<optimized out>, ctrl_typevar=..., arg0=...) at /home/ubuntu/shell-cache/js-64-profDisabled-linux-x86_64-e728bf01a2b6/objdir-js/x86_64-unknown-linux-gnu/release/build/cranelift-codegen-c0f60a350baac231/out/inst_builder.rs:5110
#14 0x000056107e2160b3 in cranelift_codegen::ir::builder::InstBuilder::fill (self=..., x=...) at /home/ubuntu/shell-cache/js-64-profDisabled-linux-x86_64-e728bf01a2b6/objdir-js/x86_64-unknown-linux-gnu/release/build/cranelift-codegen-c0f60a350baac231/out/inst_builder.rs:1436
/snip

Benjamin mentioned over email to file a bug as a start.

Flags: needinfo?(bbouvier)
Summary: Cranelift OOM crash [@ cranelift_codegen::ir::builder::InstBuilder::Unary] → Crash [@ cranelift_codegen::ir::builder::InstBuilder::Unary] (Cranelift OOM)

FWIW, Ion aborts when failing to allocate LIR nodes, so it's not a given that this is a bug. It's weird to run out of memory for almost any test program on desktop systems, but it's not unprecedented, see eg bug 1561127. That's for Ion, of course, but if you read through that bug you'll see Cranelift has a similar problem, only it grows more slowly.

(It's of course possible the present problem is a genuine bug so it should be investigated, and we do need a general strategy for OOM handling.)

Priority: -- → P3

I've discussed this with other people, and apparently Rust doesn't provide a lot of ways to check that a memory allocation failed or not. In this case, this is a Vec allocation failing, and Ryan told me we could use Vec::try_reserve which will fail if there's not enough memory. We're using a bunch of vectors that get resized a lot in Cranelift (SecondaryMaps), so it might not be efficient to use try_reserve every time. A few things we could do:

  • limit the size of wasm bytecode that Cranelift can handle. This gives an upper boundary to all the vectors used in Cranelift, knowing that a single wasm opcode might be converted into a small group of CLIR instructions.
  • we could try to pre-reserve size for all of these vectors, and fail early if any of these preallocations fail. It's not a silver bullet, because the vector could still need to be bigger than what we preallocated, but it might help (and then we can do slower reallocations).
  • there are also forks of the other std lib containers that contain methods to fallibly allocate, e.g. https://github.com/Manishearth/hashglobe for hashmap.

I'll try to look into this, unless somebody beats me to it.

Flags: needinfo?(bbouvier)
Blocks: cranelift
Summary: Crash [@ cranelift_codegen::ir::builder::InstBuilder::Unary] (Cranelift OOM) → Try to better handle OOMs in Cranelift

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

This is not fixed at all, Cranelift can still cause OOMs while allocating internal Rust data-structures.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

Removing crash signature, so that release management doesn't automatically close this bug in the future.

Crash Signature: [@ cranelift_codegen::ir::builder::InstBuilder::Unary]
Severity: normal → S3
Blocks: wasm-oom
Priority: P3 → P5

Julian, can you upstream this to the alliance and then close this one? Thanks.

Flags: needinfo?(jseward)

Information from Benjamin Bouvier:

  1. As far as is known, there is no PR for OOM behaviour in Cranelift
    currently.

  2. At present, OOM in Rust causes a panic.

  3. However, that can be changed

    https://doc.rust-lang.org/alloc/alloc/fn.handle_alloc_error.html

    The default behavior of this function is to print a message to standard error
    and abort the process. It can be replaced with set_alloc_error_hook and
    take_alloc_error_hook.

  4. Gecko already uses that

    https://searchfox.org/mozilla-central/source/toolkit/library/rust/shared/lib.rs#138

  5. So maybe this could be fixed purely on the SpiderMonkey side only?

Flags: needinfo?(jseward)

There is no active work on Cranelift.

Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: