Open Bug 1458221 Opened 3 years ago Updated 2 years ago
Crash in [@ OOM | small] with mozilla::Telemetry
IPCAccumulator::Accumulate Child Histogram spiking in ru locales
This bug was filed from the Socorro interface and is report bp-9570834d-f641-4700-bb4c-809c00180429. ============================================================= Top 10 frames of crashing thread: 0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33 1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:54 2 mozglue.dll moz_xrealloc memory/mozalloc/mozalloc.cpp:95 3 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator> xpcom/ds/nsTArray-inl.h:183 4 xul.dll nsTArray_Impl<gfxFontFeature, nsTArrayInfallibleAllocator>::AppendElement<gfxFontFeature&, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2188 5 xul.dll mozilla::TelemetryIPCAccumulator::AccumulateChildHistogram toolkit/components/telemetry/ipc/TelemetryIPCAccumulator.cpp:153 6 xul.dll `anonymous namespace'::internal_Accumulate toolkit/components/telemetry/TelemetryHistogram.cpp:998 7 xul.dll TelemetryHistogram::Accumulate toolkit/components/telemetry/TelemetryHistogram.cpp:1937 8 xul.dll mozilla::PaintTelemetry::AutoRecordPaint::~AutoRecordPaint layout/painting/nsDisplayList.cpp:10057 9 xul.dll nsRefreshDriver::Tick layout/base/nsRefreshDriver.cpp:2047 ============================================================= there is a spike for oom|small content crashes in the last couple of days coming from win32bit users of firefox in ru builds involving telemetry code: https://crash-stats.mozilla.com/signature/?useragent_locale=ru&platform=Windows&proto_signature=~mozilla%3A%3ATelemetryIPCAccumulator&signature=OOM%20%7C%20small&date=%3E%3D2018-04-01#graphs oom allocation size is 2,048 bytes most of the times. a couple of user comments are referring to tab crashes while playing a game. some mentioned this one ("candy valley") in particular: https://vk.com/app4523773?from_install=1&loc=apps
It appears that quite a few of the URLs are from this Russian game site: https://ok.ru/game/. I see URLs for all different games: *https://ok.ru/game/gardengame *https://ok.ru/game/vegamix When I scanned the list, I was hard pressed to find a URL that wasn't from that particular site.
AutoRecordPaint records to four histograms every time it is destroyed. It is only used in one place, when the view manager has a pending flush. This can happen in a variety of places (including within the refresh driver tick itself). However, I don't think that matters since the allocation size is so small. The TelemetryIPCAccumulator accumulates in each content process arrays of histograms and things that need to be sent to the parent process (where the accumulations actually take happen). These arrays are flushed either when reaching a high water mark in size, or after 2s of time. Reaching 2048 bytes of malloc was assumed to be acceptable operation. The high water mark for histograms is at 5k elements (and we'll continue recording accumulations 5x as many before truncation), and each accumulation struct is 64 bytes in size (so the 2048B allocation means an array of size 32). Is this just a case of memory pressure and we happen to be the unlucky one allocating at this crucial moment? : https://searchfox.org/mozilla-central/rev/8837610b6c999451435695e800f38d4acbc0a644/layout/base/nsRefreshDriver.cpp#2066 : https://searchfox.org/mozilla-central/rev/8837610b6c999451435695e800f38d4acbc0a644/layout/base/nsRefreshDriver.cpp#2104
yes, most of the reports seem to show "System memory use percentage" in the 80s & 90s. curiously the report from comment #0 is at 63% though and therefore probably not under particular memory pressure...
P2 for visibility. I don't think there's much we can do here, as it appears we're just unlucky to be holding the hot potato. As for the 63%, a recent conversation on the stability list highlights that we can be killed due to OOM on Windows by running out of Commit, not just used bytes. I'm not sure how likely this is to be the case here, but "memory" is difficult to count :S : https://mail.mozilla.org/private/stability/2018-May/002226.html (may requires being a list member)
Priority: -- → P2
From our understanding, we are not causing the problem, we just end up being blamed due to the allocation timing.
Component: Telemetry → General
Priority: P2 → --
Product: Toolkit → Core
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.