Open Bug 1794059 Opened 2 years ago Updated 1 year ago

Crash in [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError]

Categories

(Core :: IPC, defect, P5)

Unspecified
Windows 10
defect

Tracking

()

Tracking Status
firefox-esr102 --- unaffected
firefox105 --- unaffected
firefox106 --- unaffected
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix

People

(Reporter: mccr8, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [no-nag])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/ff2fca57-716f-4f93-b520-e7ae60221006

MOZ_CRASH Reason: IPDL error: "SharedMemory::Create failed!". abort()ing as a result.

Top 10 frames of crashing thread:

0 xul.dll MOZ_Crash mfbt/Assertions.h:261
0 xul.dll mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:173
1 xul.dll mozilla::ipc::IProtocol::HandleFatalError const ipc/glue/ProtocolUtils.cpp:402
2 xul.dll IPC::MessageWriter::FatalError const ipc/chromium/src/chrome/common/ipc_message_utils.h:116
2 xul.dll IPC::MessageBufferWriter::MessageBufferWriter ipc/chromium/src/chrome/common/ipc_message_utils.cc:32
3 xul.dll IPC::ParamTraits<JSStructuredCloneData>::Write ipc/glue/SerializedStructuredCloneBuffer.cpp:25
4 xul.dll IPC::WriteParam ipc/chromium/src/chrome/common/ipc_message_utils.h:291
4 xul.dll IPC::ParamTraits<mozilla::SerializedStructuredCloneBuffer>::Write ipc/glue/SerializedStructuredCloneBuffer.h:77
4 xul.dll IPC::WriteParam ipc/chromium/src/chrome/common/ipc_message_utils.h:291
4 xul.dll IPC::ParamTraits<mozilla::dom::ClonedMessageData>::Write ipc/ipdl/DOMTypes.cpp:113

There are a few of these. It looks like we're running out of resources while trying to send the usual suspects (in terms of humongous messages).

Other crashes with the same signature show up slightly differently: bp-c6f069e7-64d9-43bb-8838-b2c6d0221006

Some of these are crashing in code related to Nika's work on using shared memory to send huge messages, but maybe these would have just crashed sooner before that.

Set release status flags based on info from the regressing bug 1783240

:nika, since you are the author of the regressor, bug 1783240, could you take a look?

For more information, please visit auto_nag documentation.

This crash is caused because we failed to create a shared memory region, which is likely due to an out-of-memory situation. I presume the reason it's showing up as OOM | is because the "Last Error Value" code is ERROR_COMMITMENT_LIMIT, meaning that the creation of the shared memory region is failing because we're out of commit space.

It might be nice to make this error more explicitly an oom failure - perhaps by adding some extra overridable virtual methods to IProtocol to allow handling different types of errors separately, rather than forwarding the error directly to FatalError. That being said, I do expect that these errors would have occurred before, just in different places, or with different signatures.

Flags: needinfo?(nika)
Severity: S2 → S3

Set release status flags based on info from the regressing bug 1783240

Duplicate of this bug: 1798641

Copying crash signatures from duplicate bugs.

Crash Signature: [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] → [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError]

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta

:jld, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)
Keywords: topcrash

As previously mentioned, this is effectively a variant of the OOM crash, and is otherwise fairly unremarkable. Like with OOM crashes, these are largely unrecoverable, so we can't do much about it. At best, we could potentially try to do something similar to what we do in the allocator, and try to wait & try again after a short period of time.

Crash Signature: [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] → [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError]

I think the way forward is probably to implement a similar trick to the one in bug 1716727, and re-try allocating shared memory on failure. This would happen around https://searchfox.org/mozilla-central/rev/da01b81351132594e1ad84e50335162f5033c148/ipc/chromium/src/base/shared_memory_win.cc#125-127.

We should probably try to make this follow the same logic as bug 1716727 under the hood, to keep things relatively in-line with one another.

Flags: needinfo?(jld)
See Also: → 1716727
Depends on: 1804499

I'm wondering if this bug happens in normal conditions, when Firefox can unload tabs to reduce memory use, or only because of underlying bugs like reloading many tabs at once (and opening 200-300 processes) instead of queuing it.

One way to reproduce this bug is to refresh a lot of tabs.
Using Tree Style Tabs extension, selecting all tabs by shift + clicking bottom tab then top tab.
With 12 GB of RAM, 600+ tabs is enough to trigger this bug.

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 20 desktop browser crashes on beta
  • Top 5 desktop browser crashes on Windows on release

:jld, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)

Bad bot.

Severity: S3 → S4
Flags: needinfo?(jld)
Priority: -- → P5
Whiteboard: [no-nag]

I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory and while I do run TST with 600-700 tabs, I usually only have 20-40 active tabs. In these cases, FF is only using 3-6GB of memory and by itself isn't a problem and I don't see any crashes.

However, If I run some other bigger memory processes at the same time (MS Sandbox, MS Visual Studio, etc.) and get the total physical memory to around 80% utilization, Firefox crashes frequently for me. (Crash 1, Crash 2, Crash 3, Crash 4, Crash 5)

In each of these crash reports, the total physical memory still available is around ~20% (7.4GB free).

I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory

The crash logs show that you're running out of page file. It doesn't matter how much RAM you have, if Windows has no page file free it will refuse to give that memory to applications (for some more explanation of this behavior, see e.g. https://hacks.mozilla.org/2022/11/improving-firefox-stability-with-this-one-weird-trick/) and we will OOM because all that nice RAM isn't actually usable for anything but disk cache :-)

That's also why this issue gets worse if you run other applications alongside Firefox: they'll use page file as well.

Comment 8 points out that the trick described in that article could be used here as well, but obviously it's better to fundamentally fix it on your system and make sure the page file matches the RAM.

Depends on: 1822383

(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)

I just wanted to provide one example (not sure if helpful or not) where I do not believe this is exclusively an OOM issue. In my case, I have a machine with 32GB of memory

The crash logs show that you're running out of page file. It doesn't matter how much RAM you have, if Windows has no page file free it will refuse to give that memory to applications (for some more explanation of this behavior, see e.g. https://hacks.mozilla.org/2022/11/improving-firefox-stability-with-this-one-weird-trick/) and we will OOM because all that nice RAM isn't actually usable for anything but disk cache :-)

That's also why this issue gets worse if you run other applications alongside Firefox: they'll use page file as well.

Comment 8 points out that the trick described in that article could be used here as well, but obviously it's better to fundamentally fix it on your system and make sure the page file matches the RAM.

Thank you for your insights and reply on this. You are absolutely right. I bumped up my pagefile (and now monitoring the committed memory) and see that it was going beyond the old number. Thank you.

Duplicate of this bug: 1830403
See Also: → 1844574
Duplicate of this bug: 1844574

Copying crash signatures from duplicate bugs.

Crash Signature: [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] → [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalErro…

Don't be alarmed by the crash spike under this signature, it's caused by the changes in bug 1746940 which are moving crashes that were missing symbols to this signature.

Crash Signature: [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalErro… → [@ OOM | unknown | mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::MessageWriter::FatalError] [@ mozilla::ipc::FatalErr…
You need to log in before you can comment on or make changes to this bug.