Closed Bug 1612859 Opened 9 months ago Closed 3 months ago

Crash in [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString | mozilla::URLPreloader::URLEntry::ReadLocation ]

Categories

(Core :: Preferences: Backend, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix

People

(Reporter: pascalc, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug is for crash report bp-59d8eb3d-896f-4d3c-a079-4e4050200129.

Top 10 frames of crashing thread:

0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:608
1 xul.dll Gecko_SetLengthCString xpcom/string/nsTSubstring.cpp:941
2 xul.dll mozilla::dom::SnappyUncompress dom/localstorage/SnappyUtils.cpp:58
3 xul.dll mozilla::dom::LSValue::Converter::Converter dom/localstorage/LSValue.h:90
4 xul.dll mozilla::dom::LSObserverChild::RecvObserve dom/localstorage/ActorsChild.cpp:143
5 xul.dll mozilla::dom::PBackgroundLSObserverChild::OnMessageReceived ipc/ipdl/PBackgroundLSObserverChild.cpp:205
6 xul.dll mozilla::ipc::PBackgroundChild::OnMessageReceived ipc/ipdl/PBackgroundChild.cpp:5806
7 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2136
8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1220
9 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486

This crash spiked since 72 and although the volume is in the hundreds only, it seems that most of the crashes are startup crashes which means user loss. Filing as we don't have a bug already filed for this signature.

This looks like a number of crashes getting bucketed together, probably due to a compiler or code change that is causing Gecko_SetLengthCString to block crash bucket splitting. Maybe this used to show up as nsACString_internal::SetCapacity or something. I'll file a bug about adding that to the prefix list.

Splitting by proto signature, it looks like about half of these crashes are while we're in mozilla::openPrefFile(), specifically inside mozilla::URLPreloader::URLEntry::ReadLocation: bp-b289db5a-6135-4a41-9b53-aad3f0200202

Depends on: 1612921

Half of the crashes have enormous OOM allocation sizes (>100 MiB, sometimes >1 GiB). These all seem to match the pref-file-loading crashes. It seems very odd that the prefs file would be that large. Could we accidentally append ludicrous amounts of data to the pref file? Note that many of these crashes are startup ones and a lot of the remaining ones happen within the first minute so it might indeed be us trying to load far more than what the user's machine can deal with.

Nick, can you think of a scenario where the pref file might become huge? Would it be possible to stream it instead of loading it in one go?

Flags: needinfo?(n.nethercote)

I can't think of such a scenario, sorry.

Flags: needinfo?(n.nethercote)

I am tracking this one for 74 as startup crashes impact user retention. Happy to evaluate an uplift in 74 beta if the investigation gets us to a fix.

FYI, from crash-stats it appears the ::Preferences section of these crashes started in 72.

First buildid with a crash in 72.0a1 was 20191124212724

I'm tentatively moving this to the preferences backend component since that's where most crashes are taking place.

Component: General → Preferences: Backend

I'll update the signature to what the pref file crash is after the signature change.

Crash Signature: [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString] → [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString] [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString | mozilla::URLPreloader::URLEntry::ReadLocation ]
Summary: Crash in [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString] → Crash in [@ OOM | large | NS_ABORT_OOM | Gecko_SetLengthCString | mozilla::URLPreloader::URLEntry::ReadLocation ]

:njn, can you please triage this and in case assign someone? We are tracking this for 74. Thank you!

Flags: needinfo?(n.nethercote)

The good news here is that the crash rate peaked in late January and has dropped off greatly since then.

To reiterate some things that were said above:

  • All the crashes with the mozilla::URLPreloader::URLEntry::ReadLocation frame suggest the same story: we start the browser and read prefs.js and OOM on an allocation of size somewhere between 2 and 4 GiB, presumably because prefs.js is that size. There are some repeated crashes coming from the same machine, and the size of the OOM is the same across all those crash reports, which is consistent with the huge-prefs.js theory.
  • Crashes without that frame are unrelated.

String prefs have a maximum size of 1 MiB. So if, say, between 2000 and 4000 strings prefs each 1MiB long somehow made it into prefs.js, then that could lead to the abovementioned behaviour. Can add-ons still set prefs? If so, a malicious or badly-code add-on might explain this. It feels like a stretch, but I can't think what else might cause it.

It's not clear to me that this bug is actionable. The only idea for a concrete action is to make the allocation infallible... but then Firefox would have to handle the failure by starting up without any of the old prefs set, i.e. as if the user had created a new profile, which wouldn't be a good experience. So I'm not sure that would be an improvement.

Flags: needinfo?(n.nethercote)

I am going to untrack this bug for 74 since we had no crash on the last 3 betas.

The priority flag is not set for this bug.
:njn, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(n.nethercote)
Flags: needinfo?(n.nethercote)
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.