Closed Bug 1475218 Opened 6 years ago Closed 6 years ago

Crash in OOM | large | NS_ABORT_OOM | nsTSubstring<T>::SetLength with local storage on Fennec

Categories

(Core :: DOM: Core & HTML, defect, P3)

61 Branch
Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- wontfix
firefox62 + fixed
firefox63 --- fixed

People

(Reporter: marcia, Assigned: janv)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-d916dd12-3690-434d-a774-6039b0180712.
=============================================================

Seen while looking at Fennec release crash stats: https://bit.ly/2NJvHvq. Rising crash in 61.0 that has moved into the #11 slot. So far it appears 62/63 are not affected. 1153 crashes/517 installs so far.

No useful comments. Correlations show as follows:

(100.0% in signature vs 04.64% overall) moz_crash_reason = MOZ_CRASH(OOM)
(100.0% in signature vs 40.37% overall) address = 0x0
(100.0% in signature vs 74.72% overall) reason = SIGSEGV
(43.76% in signature vs 16.19% overall) Module "AudioFlinger::Client (deleted)" = true
(90.15% in signature vs 65.30% overall) Module "libmozavcodec.so" = true
(90.15% in signature vs 65.30% overall) Module "libmozavutil.so" = true

Top 10 frames of crashing thread:

0 libxul.so NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:614
1 libxul.so nsTSubstring<char16_t>::SetLength xpcom/string/nsTSubstring.cpp:814
2 libxul.so IPC::ParamTraits<nsTSubstring<char16_t> >::Read ipc/glue/IPCMessageUtils.h:460
3 libxul.so mozilla::ipc::PBackgroundParent::OnMessageReceived ipc/ipdl/PBackgroundParent.cpp:1428
4 libmozglue.so RedBlackTree<extent_node_t, ExtentTreeSzTrait>::Insert memory/build/rb.h:394
5 libmozglue.so RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::MoveRedRight 
6 libmozglue.so RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::LeanRight 
7 libmozglue.so RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::Insert memory/build/rb.h:380
8 libxul.so wcsrtombs 
9 libxul.so wcsrtombs 

=============================================================
Those stacks are iffy, but they seem to be going through local storage [1] deserialization. We might want to add `SetLength` to our skip list as it's a pretty generic OOM.

[1] https://crash-stats.mozilla.com/sources/highlight/?url=https://gecko-generated-sources.s3.amazonaws.com/5a57e7442378a146debe7bfd25911e7fa92fb69450d1c02295bd7f388511dbeb36d2426eaf83b7118a896c114089ee94f5023d5d74092ecae47da153d1c8561a/ipc/ipdl/PBackgroundParent.cpp#L-1428
Component: XPCOM → DOM
Priority: -- → P3
This sounds very similar to bug 1462162, where heavy local storage activity ends up using a lot of memory in this deserialization code. It is a little different, in that there is no preallocated process involved, because it is Fennec.

Jan, will bug 1462162 help with the non-e10s case?
Flags: needinfo?(jvarga)
(In reply to Andrew McCreight [:mccr8] from comment #2)
> This sounds very similar to bug 1462162, where heavy local storage activity
> ends up using a lot of memory in this deserialization code. It is a little
> different, in that there is no preallocated process involved, because it is
> Fennec.
> 
> Jan, will bug 1462162 help with the non-e10s case?

My patch optimizes broadcasting of local storage changes across multiple content processes in the e10-multi case (not just the preallocated process).
non-e10s case is not handled in the patch.
However, I think we could do something special for Fennec, since if there are no multiple content processes, we don't need to synchronize local storage caches at all, there's just one cache.
Flags: needinfo?(jvarga)
See Also: → 1462162
Summary: Crash in OOM | large | NS_ABORT_OOM | nsTSubstring<T>::SetLength → Crash in OOM | large | NS_ABORT_OOM | nsTSubstring<T>::SetLength with local storage on Fennec
Updated the status flags since there are a handful of crashes in 62 and at least one recently in 63.
In Fennec 62b9 this is the #3 browser top crash.
Jan, can you take another look here or help find someone to investigate? This crash signature has increased a lot in recent betas.
Flags: needinfo?(jvarga)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #6)
> Jan, can you take another look here or help find someone to investigate?
> This crash signature has increased a lot in recent betas.

I just requested a beta approval in bug 1462162. It has to land on the 62 branch first, then we will see if it fixes this crash too. If it doesn't fix the crash, I'll add additional optimizations for Fennec.
Flags: needinfo?(jvarga)
Andrew, since Fennec doesn't use e10s, should we just avoid creating PBackgroundLocalStorageCache actors completely in that case ?
There can be only one Local Storage Cache, so there's nothing to synchronize.
Flags: needinfo?(bugmail)
Agreed that in Fennec the broadcast mechanism is just overhead and it makes sense to disable the broadcasts/broadcast mechanism.

That said, the allocation sizes here that are failing are "just" 1.2M and 2.2M and the signatures here also are showing up under case PBackgroundStorage::Msg_AsyncUpdateItem__ID, and that's one we can't be rid of (because that's the actual LocalStorageg storage command, not the redundant broadcast).  If this is one of those cases :janv observed like the vidyo web beta concatenating a log and repeatedly storing it in localstorage, it's possible the content side allocations are under control thanks to string ropes just making many small allocations.

Given armeabi-v7a is a 32-bit arch, maybe we should consider some kind of proactive intervention (https://github.com/WICG/interventions) for large LocalStorage values.  Maybe only when the address space is "at risk".  Or maybe we should consider it simply because, as in bugs like bug 857888, we don't expect devices running Fennec to have the same level of available storage and so storing huge values is mis-use.  In any event, I'll try and factor this into my review of next-gen localstorage... it's likely the situation for fennec got worse once we switched to unified LocalStorage handling via PBackground...
Flags: needinfo?(bugmail)
Andrew are you taking this on? Or maybe Jan?

It's still the #4 top crash in beta 62 so I expect it will look pretty bad on release.  On release 61 we can see around 16000 crashes per week, which is very high.
Flags: needinfo?(jvarga)
Flags: needinfo?(bugmail)
Keywords: topcrash
Assigning to Jan because I know he'd much rather focus on the reviews I owe him. :)  Leaving needinfo up for next steps.
Assignee: nobody → jvarga
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
Agreed.
Attached patch patchSplinter Review
Andrew, this patch just disables PBackgroundLocalStorageCache on Android. I think it should be enough for now. We'll see how it affects number of OOM crashes, then we can decide what to do next.

try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=07e0bc5ecfc5fd36b90eeb3c109987a1504391da
Flags: needinfo?(jvarga)
Attachment #8998890 - Flags: review?(bugmail)
Hello Andrew and Jan - We are getting later in the 62 cycle. Are we going to try to land this to see if it improves the crash situation before 62 goes to release? Thanks.
Attachment #8998890 - Flags: review?(bugmail) → review+
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2181b2ee1fc
Disable PBackgroundLocalStorageCache on Android; r=asuth
Flags: needinfo?(jvarga)
https://hg.mozilla.org/mozilla-central/rev/a2181b2ee1fc
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Please request Beta approval on this when you get a chance.
Ok, looks like nightly is now getting less crashes, going to request beta approval.
Flags: needinfo?(jvarga)
Comment on attachment 8998890 [details] [diff] [review]
patch

Approval Request Comment
[Feature/Bug causing the regression]: bug 1285898
[User impact if declined]: Fennec users may still experience crashes related to LocalStorage.
[Is this code covered by automated tests?]: Yes (mochitests and wpt tests)
[Has the fix been verified in Nightly?]: Yes (and it seems number of crashes is declining in Nightly)
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No (very low risk).
[Why is the change risky/not risky?]: Changes are small, only affect Fennec and landed on m-c almost a week ago.
[String changes made/needed]: None.
Attachment #8998890 - Flags: approval-mozilla-beta?
Comment on attachment 8998890 [details] [diff] [review]
patch

Helps reduce OOM crashes on Fennec. Approved for Fennec 62.0b21.
Attachment #8998890 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1498194
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: