Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | IPC::ParamTraits<JSStructuredCloneData>::Write]
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: mirh, Unassigned)
References
Details
Crash report: https://crash-stats.mozilla.org/report/index/63d5c623-e836-4eb9-9762-20b580240218
MOZ_CRASH Reason: MOZ_CRASH(IPC FatalError in the parent process!)
Top 10 frames of crashing thread:
0 libxul.so mozilla::ipc::FatalError /usr/src/debug/firefox/firefox-122.0/ipc/glue/ProtocolUtils.cpp:209
1 libxul.so mozilla::ipc::IProtocol::HandleFatalError /usr/src/debug/firefox/firefox-122.0/ipc/glue/ProtocolUtils.cpp:440
2 libxul.so IPC::ParamTraits<JSStructuredCloneData>::Write /usr/src/debug/firefox/firefox-122.0/ipc/glue/SerializedStructuredCloneBuffer.cpp:25
3 libxul.so IPC::WriteParam<JSStructuredCloneData const&> /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
3 libxul.so IPC::ParamTraits<mozilla::SerializedStructuredCloneBuffer>::Write /usr/src/debug/firefox/firefox-122.0/obj/dist/include/mozilla/ipc/SerializedStructuredCloneBuffer.h:77
3 libxul.so IPC::WriteParam<mozilla::SerializedStructuredCloneBuffer const&> /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
3 libxul.so IPC::ParamTraits<mozilla::dom::ClonedMessageData>::Write /usr/src/debug/firefox/firefox-122.0/obj/ipc/ipdl/DOMTypes.cpp:127
4 libxul.so IPC::WriteParam<mozilla::dom::ClonedOrErrorMessageData const&> /usr/src/debug/firefox/firefox-122.0/ipc/chromium/src/chrome/common/ipc_message_utils.h:441
4 libxul.so mozilla::dom::PContentParent::SendWindowPostMessage /usr/src/debug/firefox/firefox-122.0/obj/ipc/ipdl/PContentParent.cpp:6277
5 libxul.so mozilla::dom::ContentParent::RecvWindowPostMessage /usr/src/debug/firefox/firefox-122.0/dom/ipc/ContentParent.cpp:7658
Not sure how much related, but after this I got bug 1514734 on the subsequent browser restart.
Only restarting it again helped.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::IPC' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•2 years ago
|
||
(In reply to mirh from comment #0)
Crash report: https://crash-stats.mozilla.org/report/index/63d5c623-e836-4eb9-9762-20b580240218
...
Not sure how much related, but after this I got bug 1514734 on the subsequent browser restart.
Only restarting it again helped.
This is a crash when trying to dup(...) a file descriptor for a shared memory region to send it over IPC. This can be confirmed with the IPCSystemError which is EMFILE ("Too many open files").
I expect that the subsequent SharedStringMap crash may have also been due to system file descriptor exhaustion. Perhaps the restart after the crash reporter ended up keeping some of the descriptors from the old Firefox process around, leading to the process being low on descriptors to start, and requiring another restart.
Leaving a ni? for :gsvelto who might know if there's any risk of us keeping around file descriptors from the pre-crash process after restarting.
Comment 3•2 years ago
|
||
We're doing a fork()/exec() couple when launching the crash reporter client, and then another one to relaunch Firefox, so all files that haven't been opened without FD_CLOEXEC will be inherited by the new instance. This is something I hadn't thought about but the fix should be easy: use posix_spawnp() like we already do on macOS.
Comment 4•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #3)
We're doing a fork()/exec() couple when launching the crash reporter client, and then another one to relaunch Firefox, so all files that haven't been opened without
FD_CLOEXECwill be inherited by the new instance. This is something I hadn't thought about but the fix should be easy: useposix_spawnp()like we already do on macOS.
I don't think that will help — POSIX_SPAWN_CLOEXEC_DEFAULT is an Apple-specific extension. We could try to use base::LaunchApp but I don't like the idea of trying to call that from a process that might already have heap corruption.
We could try to be more thorough about setting cloexec even if it's not perfect; if I recall correctly there's still some low-hanging fruit there. Also… we could try to close unexpected fds on startup, and in theory that shouldn't break anything, but it feels dangerous.
Comment 5•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #4)
I don't think that will help —
POSIX_SPAWN_CLOEXEC_DEFAULTis an Apple-specific extension. We could try to usebase::LaunchAppbut I don't like the idea of trying to call that from a process that might already have heap corruption.
You're right, I was under the impression that I could always close all spurious files with posix_spawnp() but that's not the case.
We could try to be more thorough about setting cloexec even if it's not perfect; if I recall correctly there's still some low-hanging fruit there. Also… we could try to close unexpected fds on startup, and in theory that shouldn't break anything, but it feels dangerous.
Yes, also probably not worth the fuss.
Comment 6•2 years ago
|
||
You're right, I was under the impression that I could always close all spurious files with posix_spawnp() but that's not the case.
After you've generated the crash report, you can presumably clean up all foreign fd's before doing the relaunch, just need to do it manually.
Comment 7•2 years ago
|
||
The severity field is not set for this bug.
:jld, could you have a look please?
For more information, please visit BugBot documentation.
Comment 8•1 year ago
|
||
Not sure if this is the right severity, but it should be close enough. There are a few existing bugs (and bug 1467345 in particular) about setting close-on-exec more consistently; the focus there was more on fork/exec of external commands and not specifically about restarting Firefox but most of the ideas apply here.
Description
•