1803675 - Large spike in OOM-like crashes on November 30th on YouTube

Reporter

Description

•

2 years ago

There has been a massive spike in OOM-ish crashes across release, beta and Nightly, starting on November 30th, on both Android and desktop. I've marked some of the existing bugs for these signatures in the "see also" field.

Lots of comments talking about how YouTube isn't working for them any more.

[@ core::option::expect_failed | alloc::alloc::alloc ] is also spiking up, but I'm not sure it is an OOM issue and it isn't quite as frequent as the others on release.

The signature [@ stackoverflow | mozilla::Internals::GetPrefValue<T> ] also looks related. Lots of comments like "After the new update, FF is unusable with YouTube."

Comments on lots of the DOM-ish variants [@ nsGlobalWindowInner::ClearDocumentDependentSlots ], [@ nsGlobalWindowOuter::SetNewDocument ] and [@ mozilla::dom::JSActorManager::ReceiveRawMessage ] also mention YouTube frequently.

Alastor Wu [:alwu]

Updated

•

2 years ago

Blocks: media-triage

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

2 years ago

The mozilla::Internals::GetPrefValue<T> crash looks like it involves infinite recursion in FontFaceSet bindings. I wonder if we hit an OOM at some odd point and ended up with a data structure in a weird configuration that causes us to infinitely loop. Here's an example of that: bp-ac3a6a51-2607-4f64-92c9-251570221130

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

2 years ago

[Tracking Requested - why for this release]: Some mostly rare OOM-ish crashes suddenly became top crashes overnight on multiple channels (see bug 1405521 comment 24), and YouTube is frequently mentioned.

status-firefox107: --- → affected

status-firefox108: --- → affected

status-firefox109: --- → affected

tracking-firefox107: --- → ?

tracking-firefox108: --- → ?

tracking-firefox109: --- → ?

Andrew McCreight [:mccr8]

Reporter

Comment 3

•

2 years ago

[@ stackoverflow | js::SharedShape::getInitialShape ] looks like it is another manifestation of the fontface set DOM bindings infinite recursion:
bp-6310245c-bfed-450b-b5de-c545d0221201

Crash Signature: [@ nsGlobalWindowInner::ClearDocumentDependentSlots ][@ nsGlobalWindowOuter::SetNewDocument ][@ mozilla::dom::JSActorManager::ReceiveRawMessage ][@ stackoverflow | mozilla::Internals::GetPrefValue<T> ] → [@ nsGlobalWindowInner::ClearDocumentDependentSlots ][@ nsGlobalWindowOuter::SetNewDocument ][@ mozilla::dom::JSActorManager::ReceiveRawMessage ][@ stackoverflow | mozilla::Internals::GetPrefValue<T> ] [@ stackoverflow | js::SharedShape::getInitialShape ]

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 4

•

2 years ago

The urls also mention YouTube. The crash spike started at ~2022-11-30T17:37 UTC and got more frequent at ~17:50 (first one was 2022-11-29 21:23:46 UTC). Volume dropped ~6h later - not a sharp drop, problematic code might have been loaded before.

Because all release channels were affected:

Could this be related to the Widevine update (bug 1801201), e.g. Firefox needed a restart to complete the update?
Have we heard from YouTube about changes on their end and metrics identifying issues followed by a roll back?

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

2 years ago

The DOM bindings infinite recursion issue seems to be happening when we fail to allocate a JS object due to an OOM, and Emilio found a place where we seem to not be recovering gracefully from that situation, so that at least explains how an OOM might turn into a stack overflow.

Andrew McCreight [:mccr8]

Reporter

Comment 6

•

2 years ago

Andrew, is there some way we could tell if these crashes were related to bug 1801201?

Flags: needinfo?(aosmond)

Andrew McCreight [:mccr8]

Reporter

Comment 7

•

2 years ago

I looked at a handful of these crashes, and some had 4.10.2449.0 and some had 4.10.2557.0 for their gmp-widevinecdm value in the telemetry environment, so there doesn't seem to be a strict correlation to either the old or new version.

Andrew McCreight [:mccr8]

Reporter

Comment 8

•

2 years ago

Emilio filed bug 1803682 for the infinite recursion, but I suspect that if we fix that, those crashes will just turn into some other OOM.

Comment 9

•

2 years ago

Comment 4 says this was a 6 hour spike, so maybe it doesn't need to be tracked, but it would be good to understand what happened here.

Peter Van der Beken [:peterv]

Comment 10

•

2 years ago

(In reply to Andrew McCreight [:mccr8] from comment #8)

Emilio filed bug 1803682 for the infinite recursion, but I suspect that if we fix that, those crashes will just turn into some other OOM.

I think that (and maybe this one too) are a dupe of bug 1746997.

Peter Van der Beken [:peterv]

Comment 11

•

2 years ago

Well, maybe this one is more about the OOM.

Gabriele Svelto [:gsvelto]

Comment 12

•

2 years ago

This signature also seems correlated, plenty of YouTube URLs in there.

Andrew Osmond [:aosmond] (he/him)

Comment 13

•

2 years ago

I could believe it, but it seems a bit surprising if it was the Widevine update? We updated nightly on Nov 21st and saw almost every client updated within a few days (we verified this through telemetry), but the crash spike started at the same time as every other channel 9 days later, even ESR.

Flags: needinfo?(aosmond)

Donal Meehan [:dmeehan]

Comment 14

•

2 years ago

Not tracking against a specific release. As mentioned in Comment 4, this was across all releases for a period of time. The spike went away when presumably YouTube rolled out a fix or rolled back a change?
Could we contact them to see what change they rolled out recently?
Would that help the investigation to harden against similar problems in the future?

tracking-firefox107: ? → -

tracking-firefox108: ? → -

tracking-firefox109: ? → -

Andrew McCreight [:mccr8]

Reporter

Updated

•

2 years ago

Summary: Large spike in OOM-like crashes on November 30th → Large spike in OOM-like crashes on November 30th on YouTube

Gabriele Svelto [:gsvelto]

Comment 15

•

2 years ago

Something similar happening on Reddit but in CSS code, bug 1803876. Maybe it's related, maybe not.

Jim Mathies [:jimm]

Comment 16

•

2 years ago

Moving out of media. The signatures here fall in various components. Maybe this bug should be broken up? Overall though it appears the cause has gone away.

No longer blocks: media-triage

Component: Audio/Video: Playback → General

Andrew McCreight [:mccr8]

Reporter

Comment 17

•

2 years ago

The crash signatures are a side effect of high memory usage while using YouTube. I'd guess that it is related to video playback, but we don't know for certain. I was hoping that the media team might have a pre-existing relationship with YouTube to figure out if they deployed and reverted some change to the their site, so that we can better understand what went wrong in Firefox.

BugBot [:suhaib / :marco/ :calixte]

Comment 18

•

2 years ago

The bug is linked to topcrash signatures, which match the following criteria:

Top 10 content process crashes on release
Top 10 desktop browser crashes on nightly
Top 10 content process crashes on beta

:freddy, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(fbraun)

Keywords: topcrash

Frederik Braun [:freddy]

Comment 19

•

2 years ago

Well, if this is a top crash, then yes -- we should increase the severity. Given the variety of clues and hunches, it's just not clear to me what the reasonable next steps should look like.

Severity: -- → S2

Flags: needinfo?(fbraun)

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

2 years ago

The bug is linked to topcrash signatures, which match the following criteria:

Top 10 desktop browser crashes on nightly
Top 20 desktop browser crashes on release (startup)
Top 10 content process crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash-startup

Andrew McCreight [:mccr8]

Reporter

Comment 21

•

2 years ago

There's other bugs on file for these crashes. I don't think this serves any purpose, as it looks like nobody is going to look at the YouTube crash spike.

Status: NEW → RESOLVED

Closed: 2 years ago

Keywords: topcrash, topcrash-startup

Resolution: --- → INCOMPLETE

Bugzilla

Large spike in OOM-like crashes on November 30th on YouTube

Categories

(Core :: General, enhancement)

Tracking

()

People

(Reporter: mccr8, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Updated

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21