Closed Bug 1854179 Opened 1 year ago Closed 1 year ago

Crash in OOM | large | thin_vec::header_with_capacity

Categories

(Toolkit :: Crash Reporting, defect)

Firefox 117
Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
120 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox117 --- wontfix
firefox118 --- wontfix
firefox119 --- fixed
firefox120 --- fixed

People

(Reporter: yannis, Assigned: gsvelto)

References

(Regression)

Details

(Keywords: regression)

Crash Data

Attachments

(2 files)

A high volume main process crash starting with 116 release, mostly from Ubuntu users. This main process crash is the consequence of a crash in a child process for which the annotations are unreasonably sized.

Example crash: here

Shared call stack for almost all crashes we received:

 0 firefox!MOZ_Crash(char const*, int, char const*) @ /build/firefox/parts/firefox/build/mfbt/Assertions.h:281
 1 firefox!mozalloc_abort @ /build/firefox/parts/firefox/build/memory/mozalloc/mozalloc_abort.cpp:35
 2 firefox!mozalloc_handle_oom(unsigned long) @ /build/firefox/parts/firefox/build/memory/mozalloc/mozalloc_oom.cpp:51
 3 libxul!mozglue_static::oom_hook::hook @ /build/firefox/parts/firefox/build/mozglue/static/rust/lib.rs:137
 4 libxul!rust_oom @ library/std/src/alloc.rs:355
 5 libxul!__rg_oom @ library/alloc/src/alloc.rs:423
 6 libxul!__rust_alloc_error_handler
 7 libxul!alloc::alloc::handle_alloc_error::rt_error @ library/alloc/src/alloc.rs:389
 8 libxul!alloc::alloc::handle_alloc_error @ library/alloc/src/alloc.rs:393
 9 libxul!thin_vec::header_with_capacity @ /build/firefox/parts/firefox/build/third_party/rust/thin-vec/src/lib.rs:414
 a libxul!thin_vec::ThinVec<T>::with_capacity @ /build/firefox/parts/firefox/build/third_party/rust/thin-vec/src/lib.rs:557
 b libxul!mozannotation_server::retrieve_annotations @ /build/firefox/parts/firefox/build/toolkit/crashreporter/mozannotation_server/src/lib.rs:80
 c libxul!mozannotation_retrieve @ /build/firefox/parts/firefox/build/toolkit/crashreporter/mozannotation_server/src/lib.rs:44
 d libxul!CrashReporter::OnChildProcessDumpRequested(void*, google_breakpad::ClientInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/nsExceptionHandler.cpp:3253
 e libxul!google_breakpad::CrashGenerationServer::ClientEvent(short) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:322
 f libxul!google_breakpad::CrashGenerationServer::Run() @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:189
10 libxul!google_breakpad::CrashGenerationServer::ThreadMain(void*) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:379

Shared error for almost all crashes we received:

out of memory: 0x0000000773594008 bytes requested

We are trying to allocate around 29.8GB of memory when we retrieve annotations for the child process crash. We should put an arbitrary limit on the length of the annotations ThinVec and investigate what explains this recurrent allocation size of 0x0000000773594008 here.

I wonder if this crash could have some relation with bug 1685642 which also started spiking in 116?

Bug 1776197 which introduced the new system for pulling crash annotations landed in version 116, so there's could be a relation. I'll put a cap on how many annotations we retrieve from the child process as we already know it.

BTW the logic that finds the annotations in the child process does so by poking around in the ELF headers. Since this is Ubuntu-specific it's possible that something is happening with that. These are Snap builds from what I can tell and I can't be sure if the sandbox - or something else - is messing up the executable layout.

Signatures are not limited to canonical-002 so at least this is not snap-specific but impacts all users
Looks like the libxul.so path says otherwise, and this could be a data collection issue at crash generation?

Further investigation highlighted that we do get content process crashes from the affected builds - so it's not a build / executable layout problem - however we don't get rdd / utility / socket / gpu crashes. :gerard-majax confirmed that killing the utility process causes the main process to get stuck in a loop, eating a ton of CPU (all in the breakpad server thread, likely trying to read 30 GiB worth of annotations).

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e5ac9fc1c8e4 Cap the maximum number of annotations we fetch from a child process r=gerard-majax
Blocks: 1855135
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 120 Branch

I just completed a build of snap right before bug 1776197 and one right after, and I can confirm the issue is directly related. Given that even after the current fix there is still a problem, it's not unlikely we are getting blocked by something at AppArmor level that got revealed by the move to bug 1776197

Keywords: regression
Regressed by: 1776197

Set release status flags based on info from the regressing bug 1776197

The patch landed in nightly and beta is affected.
:gsvelto, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox119 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(gsvelto)

Comment on attachment 9354570 [details]
Bug 1854179 - Cap the maximum number of annotations we fetch from a child process r=gerard-majax

Beta/Release Uplift Approval Request

  • User impact if declined: The entire browser crashes instead of a single tab (or an invisible process)
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only caps the number of annotations we retrieve from the child process. With the patch applied we might retrieve some bogus annotations (most likely none at all), but we won't crash the main process. Note that this affects only Ubuntu users using the Snap build.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(gsvelto)
Attachment #9354570 - Flags: approval-mozilla-beta?

Comment on attachment 9354570 [details]
Bug 1854179 - Cap the maximum number of annotations we fetch from a child process r=gerard-majax

Approved for 119.0b3

Attachment #9354570 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I'm CC'ing Ubuntu maintainers. This already went to beta but you might want to pick it up if you want to cut a new release to fix this issue for users. I don't think we can uplift this to release as it wouldn't be released unless we do a chemspill.

Attached file GitHub Pull Request

Gabriele, we have a planned dot release mid-cycle (next Tuesday), should we uplift it into 118.0.2?

Flags: needinfo?(gsvelto)

Any reason we shouldn't try_reserve there to do fallible allocation?

(In reply to Pascal Chevrel:pascalc from comment #16)

Gabriele, we have a planned dot release mid-cycle (next Tuesday), should we uplift it into 118.0.2?

Yes, that would help. In the meantime we've identified the root issue and I'm trying to fix it, but it will take some time. This prevents the crash and yields a fairly complete crash report even when we fail to read the child process annotations.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #17)

Any reason we shouldn't try_reserve there to do fallible allocation?

We know the maximum number of annotations that are in the child process so we should enforce that. I will use your suggestion for the code that loads the individual annotations since there's no hard cap on their size and we don't want the child process to cause DOS issues to the main process in case one annotation is too large.

Flags: needinfo?(gsvelto)
See Also: → 1856899
See Also: → 1858390
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: