Crash in [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError]
Categories
(Core :: IPC, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox105 | --- | unaffected |
firefox106 | --- | unaffected |
firefox107 | --- | wontfix |
firefox108 | --- | wontfix |
firefox109 | --- | wontfix |
People
(Reporter: mccr8, Unassigned)
References
(Regression)
Details
(Keywords: crash, regression, Whiteboard: [no-nag])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/ff2fca57-716f-4f93-b520-e7ae60221006
MOZ_CRASH Reason: IPDL error: "SharedMemory::Create failed!". abort()ing as a result.
Top 10 frames of crashing thread:
0 xul.dll MOZ_Crash mfbt/Assertions.h:261
0 xul.dll mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:173
1 xul.dll mozilla::ipc::IProtocol::HandleFatalError const ipc/glue/ProtocolUtils.cpp:402
2 xul.dll IPC::MessageWriter::FatalError const ipc/chromium/src/chrome/common/ipc_message_utils.h:116
2 xul.dll IPC::MessageBufferWriter::MessageBufferWriter ipc/chromium/src/chrome/common/ipc_message_utils.cc:32
3 xul.dll IPC::ParamTraits<JSStructuredCloneData>::Write ipc/glue/SerializedStructuredCloneBuffer.cpp:25
4 xul.dll IPC::WriteParam ipc/chromium/src/chrome/common/ipc_message_utils.h:291
4 xul.dll IPC::ParamTraits<mozilla::SerializedStructuredCloneBuffer>::Write ipc/glue/SerializedStructuredCloneBuffer.h:77
4 xul.dll IPC::WriteParam ipc/chromium/src/chrome/common/ipc_message_utils.h:291
4 xul.dll IPC::ParamTraits<mozilla::dom::ClonedMessageData>::Write ipc/ipdl/DOMTypes.cpp:113
There are a few of these. It looks like we're running out of resources while trying to send the usual suspects (in terms of humongous messages).
Other crashes with the same signature show up slightly differently: bp-c6f069e7-64d9-43bb-8838-b2c6d0221006
Some of these are crashing in code related to Nika's work on using shared memory to send huge messages, but maybe these would have just crashed sooner before that.
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Set release status flags based on info from the regressing bug 1783240
:nika, since you are the author of the regressor, bug 1783240, could you take a look?
For more information, please visit auto_nag documentation.
Comment 2•2 years ago
|
||
This crash is caused because we failed to create a shared memory region, which is likely due to an out-of-memory situation. I presume the reason it's showing up as OOM |
is because the "Last Error Value" code is ERROR_COMMITMENT_LIMIT
, meaning that the creation of the shared memory region is failing because we're out of commit space.
It might be nice to make this error more explicitly an oom failure - perhaps by adding some extra overridable virtual methods to IProtocol
to allow handling different types of errors separately, rather than forwarding the error directly to FatalError
. That being said, I do expect that these errors would have occurred before, just in different places, or with different signatures.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Set release status flags based on info from the regressing bug 1783240
Updated•2 years ago
|
Comment 5•2 years ago
|
||
Copying crash signatures from duplicate bugs.
Comment 6•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 20 desktop browser crashes on beta
:jld, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 7•2 years ago
|
||
As previously mentioned, this is effectively a variant of the OOM crash, and is otherwise fairly unremarkable. Like with OOM crashes, these are largely unrecoverable, so we can't do much about it. At best, we could potentially try to do something similar to what we do in the allocator, and try to wait & try again after a short period of time.
Comment 8•2 years ago
|
||
I think the way forward is probably to implement a similar trick to the one in bug 1716727, and re-try allocating shared memory on failure. This would happen around https://searchfox.org/mozilla-central/rev/da01b81351132594e1ad84e50335162f5033c148/ipc/chromium/src/base/shared_memory_win.cc#125-127.
We should probably try to make this follow the same logic as bug 1716727 under the hood, to keep things relatively in-line with one another.
![]() |
||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 9•2 years ago
•
|
||
I'm wondering if this bug happens in normal conditions, when Firefox can unload tabs to reduce memory use, or only because of underlying bugs like reloading many tabs at once (and opening 200-300 processes) instead of queuing it.
One way to reproduce this bug is to refresh a lot of tabs.
Using Tree Style Tabs extension, selecting all tabs by shift + clicking bottom tab then top tab.
With 12 GB of RAM, 600+ tabs is enough to trigger this bug.
Comment 10•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 20 desktop browser crashes on beta
- Top 5 desktop browser crashes on Windows on release
:jld, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 12•2 years ago
|
||
I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory and while I do run TST with 600-700 tabs, I usually only have 20-40 active tabs. In these cases, FF is only using 3-6GB of memory and by itself isn't a problem and I don't see any crashes.
However, If I run some other bigger memory processes at the same time (MS Sandbox, MS Visual Studio, etc.) and get the total physical memory to around 80% utilization, Firefox crashes frequently for me. (Crash 1, Crash 2, Crash 3, Crash 4, Crash 5)
In each of these crash reports, the total physical memory still available is around ~20% (7.4GB free).
Comment 13•2 years ago
|
||
I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory
The crash logs show that you're running out of page file. It doesn't matter how much RAM you have, if Windows has no page file free it will refuse to give that memory to applications (for some more explanation of this behavior, see e.g. https://hacks.mozilla.org/2022/11/improving-firefox-stability-with-this-one-weird-trick/) and we will OOM because all that nice RAM isn't actually usable for anything but disk cache :-)
That's also why this issue gets worse if you run other applications alongside Firefox: they'll use page file as well.
Comment 8 points out that the trick described in that article could be used here as well, but obviously it's better to fundamentally fix it on your system and make sure the page file matches the RAM.
Comment 14•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)
I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory
The crash logs show that you're running out of page file. It doesn't matter how much RAM you have, if Windows has no page file free it will refuse to give that memory to applications (for some more explanation of this behavior, see e.g. https://hacks.mozilla.org/2022/11/improving-firefox-stability-with-this-one-weird-trick/) and we will OOM because all that nice RAM isn't actually usable for anything but disk cache :-)
That's also why this issue gets worse if you run other applications alongside Firefox: they'll use page file as well.
Comment 8 points out that the trick described in that article could be used here as well, but obviously it's better to fundamentally fix it on your system and make sure the page file matches the RAM.
Thank you for your insights and reply on this. You are absolutely right. I bumped up my pagefile (and now monitoring the committed memory) and see that it was going beyond the old number. Thank you.
Comment 17•2 years ago
|
||
Copying crash signatures from duplicate bugs.
Comment 18•1 year ago
|
||
Don't be alarmed by the crash spike under this signature, it's caused by the changes in bug 1746940 which are moving crashes that were missing symbols to this signature.
Description
•