Crash in OOM | large | thin_vec::header_with_capacity
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox117 | --- | wontfix |
firefox118 | --- | wontfix |
firefox119 | --- | fixed |
firefox120 | --- | fixed |
People
(Reporter: yannis, Assigned: gsvelto)
References
(Regression)
Details
(Keywords: regression)
Crash Data
Attachments
(2 files)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
|
Details | Review |
49 bytes,
text/x-github-pull-request
|
Details | Review |
A high volume main process crash starting with 116 release, mostly from Ubuntu users. This main process crash is the consequence of a crash in a child process for which the annotations are unreasonably sized.
Example crash: here
Shared call stack for almost all crashes we received:
0 firefox!MOZ_Crash(char const*, int, char const*) @ /build/firefox/parts/firefox/build/mfbt/Assertions.h:281
1 firefox!mozalloc_abort @ /build/firefox/parts/firefox/build/memory/mozalloc/mozalloc_abort.cpp:35
2 firefox!mozalloc_handle_oom(unsigned long) @ /build/firefox/parts/firefox/build/memory/mozalloc/mozalloc_oom.cpp:51
3 libxul!mozglue_static::oom_hook::hook @ /build/firefox/parts/firefox/build/mozglue/static/rust/lib.rs:137
4 libxul!rust_oom @ library/std/src/alloc.rs:355
5 libxul!__rg_oom @ library/alloc/src/alloc.rs:423
6 libxul!__rust_alloc_error_handler
7 libxul!alloc::alloc::handle_alloc_error::rt_error @ library/alloc/src/alloc.rs:389
8 libxul!alloc::alloc::handle_alloc_error @ library/alloc/src/alloc.rs:393
9 libxul!thin_vec::header_with_capacity @ /build/firefox/parts/firefox/build/third_party/rust/thin-vec/src/lib.rs:414
a libxul!thin_vec::ThinVec<T>::with_capacity @ /build/firefox/parts/firefox/build/third_party/rust/thin-vec/src/lib.rs:557
b libxul!mozannotation_server::retrieve_annotations @ /build/firefox/parts/firefox/build/toolkit/crashreporter/mozannotation_server/src/lib.rs:80
c libxul!mozannotation_retrieve @ /build/firefox/parts/firefox/build/toolkit/crashreporter/mozannotation_server/src/lib.rs:44
d libxul!CrashReporter::OnChildProcessDumpRequested(void*, google_breakpad::ClientInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/nsExceptionHandler.cpp:3253
e libxul!google_breakpad::CrashGenerationServer::ClientEvent(short) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:322
f libxul!google_breakpad::CrashGenerationServer::Run() @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:189
10 libxul!google_breakpad::CrashGenerationServer::ThreadMain(void*) @ /build/firefox/parts/firefox/build/toolkit/crashreporter/breakpad-client/linux/crash_generation/crash_generation_server.cc:379
Shared error for almost all crashes we received:
out of memory: 0x0000000773594008 bytes requested
We are trying to allocate around 29.8GB of memory when we retrieve annotations for the child process crash. We should put an arbitrary limit on the length
of the annotations ThinVec
and investigate what explains this recurrent allocation size of 0x0000000773594008
here.
I wonder if this crash could have some relation with bug 1685642 which also started spiking in 116?
Reporter | ||
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
Bug 1776197 which introduced the new system for pulling crash annotations landed in version 116, so there's could be a relation. I'll put a cap on how many annotations we retrieve from the child process as we already know it.
Assignee | ||
Comment 2•1 year ago
|
||
BTW the logic that finds the annotations in the child process does so by poking around in the ELF headers. Since this is Ubuntu-specific it's possible that something is happening with that. These are Snap builds from what I can tell and I can't be sure if the sandbox - or something else - is messing up the executable layout.
Comment 3•1 year ago
•
|
||
Signatures are not limited to canonical-002
so at least this is not snap-specific but impacts all users
Looks like the libxul.so
path says otherwise, and this could be a data collection issue at crash generation?
Assignee | ||
Comment 4•1 year ago
|
||
Further investigation highlighted that we do get content process crashes from the affected builds - so it's not a build / executable layout problem - however we don't get rdd / utility / socket / gpu crashes. :gerard-majax confirmed that killing the utility process causes the main process to get stuck in a loop, eating a ton of CPU (all in the breakpad server thread, likely trying to read 30 GiB worth of annotations).
Assignee | ||
Comment 5•1 year ago
|
||
Updated•1 year ago
|
Comment 7•1 year ago
|
||
bugherder |
Comment 8•1 year ago
|
||
I just completed a build of snap right before bug 1776197 and one right after, and I can confirm the issue is directly related. Given that even after the current fix there is still a problem, it's not unlikely we are getting blocked by something at AppArmor level that got revealed by the move to bug 1776197
Comment 9•1 year ago
|
||
Set release status flags based on info from the regressing bug 1776197
Comment 10•1 year ago
|
||
The patch landed in nightly and beta is affected.
:gsvelto, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox119
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 11•1 year ago
|
||
Comment on attachment 9354570 [details]
Bug 1854179 - Cap the maximum number of annotations we fetch from a child process r=gerard-majax
Beta/Release Uplift Approval Request
- User impact if declined: The entire browser crashes instead of a single tab (or an invisible process)
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This only caps the number of annotations we retrieve from the child process. With the patch applied we might retrieve some bogus annotations (most likely none at all), but we won't crash the main process. Note that this affects only Ubuntu users using the Snap build.
- String changes made/needed:
- Is Android affected?: No
Comment 12•1 year ago
|
||
Comment on attachment 9354570 [details]
Bug 1854179 - Cap the maximum number of annotations we fetch from a child process r=gerard-majax
Approved for 119.0b3
Comment 13•1 year ago
|
||
uplift |
Updated•1 year ago
|
Assignee | ||
Comment 14•1 year ago
|
||
I'm CC'ing Ubuntu maintainers. This already went to beta but you might want to pick it up if you want to cut a new release to fix this issue for users. I don't think we can uplift this to release as it wouldn't be released unless we do a chemspill.
Comment 15•1 year ago
|
||
Comment 16•1 year ago
|
||
Gabriele, we have a planned dot release mid-cycle (next Tuesday), should we uplift it into 118.0.2?
Comment 17•1 year ago
|
||
Any reason we shouldn't try_reserve there to do fallible allocation?
Assignee | ||
Comment 18•1 year ago
|
||
(In reply to Pascal Chevrel:pascalc from comment #16)
Gabriele, we have a planned dot release mid-cycle (next Tuesday), should we uplift it into 118.0.2?
Yes, that would help. In the meantime we've identified the root issue and I'm trying to fix it, but it will take some time. This prevents the crash and yields a fairly complete crash report even when we fail to read the child process annotations.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #17)
Any reason we shouldn't try_reserve there to do fallible allocation?
We know the maximum number of annotations that are in the child process so we should enforce that. I will use your suggestion for the code that loads the individual annotations since there's no hard cap on their size and we don't want the child process to cause DOS issues to the main process in case one annotation is too large.
Updated•1 year ago
|
Description
•