Large spike in OOM-like crashes on November 30th on YouTube
Categories
(Core :: General, enhancement)
Tracking
()
People
(Reporter: mccr8, Unassigned)
References
Details
Crash Data
There has been a massive spike in OOM-ish crashes across release, beta and Nightly, starting on November 30th, on both Android and desktop. I've marked some of the existing bugs for these signatures in the "see also" field.
Lots of comments talking about how YouTube isn't working for them any more.
[@ core::option::expect_failed | alloc::alloc::alloc ] is also spiking up, but I'm not sure it is an OOM issue and it isn't quite as frequent as the others on release.
The signature [@ stackoverflow | mozilla::Internals::GetPrefValue<T> ] also looks related. Lots of comments like "After the new update, FF is unusable with YouTube."
Comments on lots of the DOM-ish variants [@ nsGlobalWindowInner::ClearDocumentDependentSlots ], [@ nsGlobalWindowOuter::SetNewDocument ] and [@ mozilla::dom::JSActorManager::ReceiveRawMessage ] also mention YouTube frequently.
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
The mozilla::Internals::GetPrefValue<T> crash looks like it involves infinite recursion in FontFaceSet bindings. I wonder if we hit an OOM at some odd point and ended up with a data structure in a weird configuration that causes us to infinitely loop. Here's an example of that: bp-ac3a6a51-2607-4f64-92c9-251570221130
Reporter | ||
Comment 2•2 years ago
|
||
[Tracking Requested - why for this release]: Some mostly rare OOM-ish crashes suddenly became top crashes overnight on multiple channels (see bug 1405521 comment 24), and YouTube is frequently mentioned.
Reporter | ||
Comment 3•2 years ago
|
||
[@ stackoverflow | js::SharedShape::getInitialShape ] looks like it is another manifestation of the fontface set DOM bindings infinite recursion:
bp-6310245c-bfed-450b-b5de-c545d0221201
Comment 4•2 years ago
|
||
The urls also mention YouTube. The crash spike started at ~2022-11-30T17:37 UTC and got more frequent at ~17:50 (first one was 2022-11-29 21:23:46 UTC). Volume dropped ~6h later - not a sharp drop, problematic code might have been loaded before.
Because all release channels were affected:
- Could this be related to the Widevine update (bug 1801201), e.g. Firefox needed a restart to complete the update?
- Have we heard from YouTube about changes on their end and metrics identifying issues followed by a roll back?
Reporter | ||
Comment 5•2 years ago
|
||
The DOM bindings infinite recursion issue seems to be happening when we fail to allocate a JS object due to an OOM, and Emilio found a place where we seem to not be recovering gracefully from that situation, so that at least explains how an OOM might turn into a stack overflow.
Reporter | ||
Comment 6•2 years ago
|
||
Andrew, is there some way we could tell if these crashes were related to bug 1801201?
Reporter | ||
Comment 7•2 years ago
|
||
I looked at a handful of these crashes, and some had 4.10.2449.0 and some had 4.10.2557.0 for their gmp-widevinecdm value in the telemetry environment, so there doesn't seem to be a strict correlation to either the old or new version.
Reporter | ||
Comment 8•2 years ago
|
||
Emilio filed bug 1803682 for the infinite recursion, but I suspect that if we fix that, those crashes will just turn into some other OOM.
Reporter | ||
Comment 9•2 years ago
|
||
Comment 4 says this was a 6 hour spike, so maybe it doesn't need to be tracked, but it would be good to understand what happened here.
Comment 10•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #8)
Emilio filed bug 1803682 for the infinite recursion, but I suspect that if we fix that, those crashes will just turn into some other OOM.
I think that (and maybe this one too) are a dupe of bug 1746997.
Comment 11•2 years ago
|
||
Well, maybe this one is more about the OOM.
Comment 12•2 years ago
|
||
This signature also seems correlated, plenty of YouTube URLs in there.
Comment 13•2 years ago
|
||
I could believe it, but it seems a bit surprising if it was the Widevine update? We updated nightly on Nov 21st and saw almost every client updated within a few days (we verified this through telemetry), but the crash spike started at the same time as every other channel 9 days later, even ESR.
Comment 14•2 years ago
|
||
Not tracking against a specific release. As mentioned in Comment 4, this was across all releases for a period of time. The spike went away when presumably YouTube rolled out a fix or rolled back a change?
Could we contact them to see what change they rolled out recently?
Would that help the investigation to harden against similar problems in the future?
Reporter | ||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Something similar happening on Reddit but in CSS code, bug 1803876. Maybe it's related, maybe not.
Comment 16•2 years ago
|
||
Moving out of media. The signatures here fall in various components. Maybe this bug should be broken up? Overall though it appears the cause has gone away.
Reporter | ||
Comment 17•2 years ago
|
||
The crash signatures are a side effect of high memory usage while using YouTube. I'd guess that it is related to video playback, but we don't know for certain. I was hoping that the media team might have a pre-existing relationship with YouTube to figure out if they deployed and reverted some change to the their site, so that we can better understand what went wrong in Firefox.
Comment 18•2 years ago
|
||
The bug is linked to topcrash signatures, which match the following criteria:
- Top 10 content process crashes on release
- Top 10 desktop browser crashes on nightly
- Top 10 content process crashes on beta
:freddy, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 19•2 years ago
|
||
Well, if this is a top crash, then yes -- we should increase the severity. Given the variety of clues and hunches, it's just not clear to me what the reasonable next steps should look like.
Comment 20•2 years ago
|
||
The bug is linked to topcrash signatures, which match the following criteria:
- Top 10 desktop browser crashes on nightly
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 21•2 years ago
|
||
There's other bugs on file for these crashes. I don't think this serves any purpose, as it looks like nobody is going to look at the YouTube crash spike.
Description
•