Open Bug 1880739 Opened 2 years ago Updated 1 year ago

Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::ParamTraits<JSStructuredCloneData>::Write]

Categories

(Core :: IPC, defect)

Firefox 122
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: mirh, Unassigned)

References

Details

Crash report: https://crash-stats.mozilla.org/report/index/63d5c623-e836-4eb9-9762-20b580240218

MOZ_CRASH Reason: MOZ_CRASH(IPC FatalError in the parent process!)

Top 10 frames of crashing thread:

0  libxul.so  mozilla::ipc::FatalError  /usr/src/debug/firefox/firefox-122.0/ipc/glue/ProtocolUtils.cpp:209
1  libxul.so  mozilla::ipc::IProtocol::HandleFatalError  /usr/src/debug/firefox/firefox-122.0/ipc/glue/ProtocolUtils.cpp:440
2  libxul.so  IPC::ParamTraits<JSStructuredCloneData>::Write  /usr/src/debug/firefox/firefox-122.0/ipc/glue/SerializedStructuredCloneBuffer.cpp:25
3  libxul.so  IPC::WriteParam<JSStructuredCloneData const&>  /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
3  libxul.so  IPC::ParamTraits<mozilla::SerializedStructuredCloneBuffer>::Write  /usr/src/debug/firefox/firefox-122.0/obj/dist/include/mozilla/ipc/SerializedStructuredCloneBuffer.h:77
3  libxul.so  IPC::WriteParam<mozilla::SerializedStructuredCloneBuffer const&>  /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
3  libxul.so  IPC::ParamTraits<mozilla::dom::ClonedMessageData>::Write  /usr/src/debug/firefox/firefox-122.0/obj/ipc/ipdl/DOMTypes.cpp:127
4  libxul.so  IPC::WriteParam<mozilla::dom::ClonedOrErrorMessageData const&>  /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
4  libxul.so  mozilla::dom::PContentParent::SendWindowPostMessage  /usr/src/debug/firefox/firefox-122.0/obj/ipc/ipdl/PContentParent.cpp:6277
5  libxul.so  mozilla::dom::ContentParent::RecvWindowPostMessage  /usr/src/debug/firefox/firefox-122.0/dom/ipc/ContentParent.cpp:7658

Not sure how much related, but after this I got bug 1514734 on the subsequent browser restart.
Only restarting it again helped.

OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Version: unspecified → Firefox 122

The Bugbug bot thinks this bug should belong to the 'Core::IPC' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → IPC
Product: Firefox → Core

(In reply to mirh from comment #0)

Crash report: https://crash-stats.mozilla.org/report/index/63d5c623-e836-4eb9-9762-20b580240218
...
Not sure how much related, but after this I got bug 1514734 on the subsequent browser restart.
Only restarting it again helped.

This is a crash when trying to dup(...) a file descriptor for a shared memory region to send it over IPC. This can be confirmed with the IPCSystemError which is EMFILE ("Too many open files").

I expect that the subsequent SharedStringMap crash may have also been due to system file descriptor exhaustion. Perhaps the restart after the crash reporter ended up keeping some of the descriptors from the old Firefox process around, leading to the process being low on descriptors to start, and requiring another restart.

Leaving a ni? for :gsvelto who might know if there's any risk of us keeping around file descriptors from the pre-crash process after restarting.

Flags: needinfo?(gsvelto)

We're doing a fork()/exec() couple when launching the crash reporter client, and then another one to relaunch Firefox, so all files that haven't been opened without FD_CLOEXEC will be inherited by the new instance. This is something I hadn't thought about but the fix should be easy: use posix_spawnp() like we already do on macOS.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #3)

We're doing a fork()/exec() couple when launching the crash reporter client, and then another one to relaunch Firefox, so all files that haven't been opened without FD_CLOEXEC will be inherited by the new instance. This is something I hadn't thought about but the fix should be easy: use posix_spawnp() like we already do on macOS.

I don't think that will help — POSIX_SPAWN_CLOEXEC_DEFAULT is an Apple-specific extension. We could try to use base::LaunchApp but I don't like the idea of trying to call that from a process that might already have heap corruption.

We could try to be more thorough about setting cloexec even if it's not perfect; if I recall correctly there's still some low-hanging fruit there. Also… we could try to close unexpected fds on startup, and in theory that shouldn't break anything, but it feels dangerous.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #4)

I don't think that will help — POSIX_SPAWN_CLOEXEC_DEFAULT is an Apple-specific extension. We could try to use base::LaunchApp but I don't like the idea of trying to call that from a process that might already have heap corruption.

You're right, I was under the impression that I could always close all spurious files with posix_spawnp() but that's not the case.

We could try to be more thorough about setting cloexec even if it's not perfect; if I recall correctly there's still some low-hanging fruit there. Also… we could try to close unexpected fds on startup, and in theory that shouldn't break anything, but it feels dangerous.

Yes, also probably not worth the fuss.

You're right, I was under the impression that I could always close all spurious files with posix_spawnp() but that's not the case.

After you've generated the crash report, you can presumably clean up all foreign fd's before doing the relaunch, just need to do it manually.

The severity field is not set for this bug.
:jld, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jld)

Not sure if this is the right severity, but it should be close enough. There are a few existing bugs (and bug 1467345 in particular) about setting close-on-exec more consistently; the focus there was more on fork/exec of external commands and not specifically about restarting Firefox but most of the ideas apply here.

Severity: -- → S3
Flags: needinfo?(jld)
See Also: → 1467345, 1667748, 1840208
You need to log in before you can comment on or make changes to this bug.