Open Bug 1464902 Opened 2 years ago Updated 1 year ago

Crash in OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T>::Read

Categories

(Core :: IPC, defect, P2, critical)

All
Windows
defect

Tracking

()

Tracking Status
firefox64 --- wontfix
firefox65 --- affected
firefox66 --- affected

People

(Reporter: gsvelto, Unassigned)

Details

(Keywords: crash, Whiteboard: [MemShrink:P2])

Crash Data

This bug was filed from the Socorro interface and is
report bp-fbb24b88-48aa-47ef-9cf0-c860c0180528.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:614
1 xul.dll IPC::ParamTraits<nsTSubstring<char> >::Read ipc/glue/IPCMessageUtils.h:401
2 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::InputStreamParams>::Read ipc/ipdl/InputStreamParams.cpp:1269
3 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::InputStreamParamsWithFds>::Read ipc/ipdl/IPCStream.cpp:59
4 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::IPCStream>::Read ipc/ipdl/IPCStream.cpp:1073
5 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::OptionalIPCStream>::Read ipc/ipdl/IPCStream.cpp:1451
6 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::net::HttpChannelOpenArgs>::Read ipc/ipdl/NeckoChannelParams.cpp:2389
7 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::net::HttpChannelCreationArgs>::Read ipc/ipdl/NeckoChannelParams.cpp:3165
8 xul.dll mozilla::net::PNeckoParent::OnMessageReceived ipc/ipdl/PNeckoParent.cpp:881
9 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:3319

=============================================================

The reports often look like genuine OOM crashes with the machine running low on either virtual memory or commit space. However the allocation size is remarkably large for an IPC message (well over 2MiB for over half of the report, sometimes 10's of MiBs). The frequency is also significant so it might be worth keeping an eye on.
Component: IPC → Networking: HTTP
This looks like something that needs to be dealt with by Necko, but I could be wrong.
I guess these are actually in a variety of components.
Component: Networking: HTTP → IPC
At least half seem to be in PBackgroundChild::OnMessageReceived.
Crash Signature: [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T>::Read] → [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T>::Read] [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T> >::Read]
The past week of crashes for this are ½ on 32-bit Windows, ¼ on Android, and ¼ on 64-bit Windows.  So it's a lot of memory-constrained and address-space-constrained systems, but not exclusively.

(We don't use e10s on Android as far as I know, but we may be using IPC for inter-thread communication; this is a feature, to allow the same code to work for out-of-process and in-process use cases.)

Specifically a lot of them are in PBackgroundStorageParent, which I think means that content is trying to write a large memory blob (or string?) to local storage; the Necko ones look like they're similar but for a POST.


As far as fixes: we do have shared memory, which could avoid some copying here, but it doesn't help with address space fragmentation, and there are security implications if you're receiving structured data from a content process (it can retain write access and race you as you parse it).  IPC itself has some infrastructure for segmented buffers and avoiding copies on receive, but I believe this is currently used only for structured clone data in MessageManager messages and I don't know how hard it would be to expose it more generally.

And, in general, if we're already close enough to OOM that allocating 1-2 MB fails, I don't know how possible it is to fix that.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Priority: -- → P2

Adding affected branches. Currently more visible in Fennec.

This is visible on Fennec 65.0.1 release, over 16K crashes. But I don't see crashes on 66 or 67. Here are some correlations:

(100.0% in signature vs 05.09% overall) moz_crash_reason = MOZ_CRASH(OOM)
(100.0% in signature vs 46.88% overall) address = 0x0
(59.68% in signature vs 12.94% overall) Module "AudioFlinger::Client (deleted)" = true
(98.69% in signature vs 60.44% overall) Module "libmozavcodec.so" = true
(98.69% in signature vs 60.44% overall) Module "libmozavutil.so" = true
(100.0% in signature vs 69.86% overall) reason = SIGSEGV /SEGV_MAPERR
(36.24% in signature vs 04.38% overall) useragent_locale = es-ES

Crash Signature: [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T>::Read] [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T> >::Read] → [@ OOM | large | NS_ABORT_OOM | IPC::ParamTraits<nsTSubstring<T> >::Read]

The Android crashes look like they are all PBackgroundStorageParent::OnMessageReceived.

You need to log in before you can comment on or make changes to this bug.