Open Bug 1968418 Opened 5 months ago Updated 3 months ago

Closing a long-running twitch tab hangs browser for minute(s) (GC in recvConduitClosed of Conduit)

Categories

(WebExtensions :: General, defect, P3)

defect

Tracking

(Performance Impact:pending-needinfo)

Performance Impact pending-needinfo

People

(Reporter: mozbugs, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0

Steps to reproduce:

Watch a twitch stream, leave the tab open while you fall asleep. Close the tab the next morning.

Actual results:

The browser hangs for like a minute (or more, depending on how long the twitch tab was open) right after closing the tab.

I have a profile of such an event: https://share.firefox.dev/43pKbIB

(In this case I had open two windows with a twitch tab (and one new tab to keep that window open), and I closed the twitch tab whose stream had ended. You can see the other stream still keep playing for a couple of seconds after the jank has started...)

Expected results:

No jank/hang.

Component: Untriaged → Performance: Responsiveness
Product: Firefox → Core

This seems pretty serious. But maybe it only happens when certain add-ons are used - it would still be a Firefox bug, but it would affect fewer people so it would be less severe.

I can see at least one add-on related to Twitch being used. Can you check if this also happens without add-ons?

Status: UNCONFIRMED → NEW
Performance Impact: --- → pending-needinfo
Ever confirmed: true
Flags: needinfo?(mozbugs)

Rob, according to the profile the parent process hang is happening inside recvConduitClosed, specifically in this filter loop in _cast: https://searchfox.org/mozilla-central/rev/a287c0313cea972ea8e71a4d5def3462af3feffa/toolkit/components/extensions/ConduitsParent.sys.mjs#344

What could cause a high number of elements in this array? Can we do something to change what is presumably O(n^2) behavior?

Flags: needinfo?(rob)
Component: Performance: Responsiveness → General
Product: Core → WebExtensions
Version: Firefox 138 → Trunk

Urja, how often do you encounter this issue?

(In reply to Markus Stange [:mstange] from comment #2)

Rob, according to the profile the parent process hang is happening inside recvConduitClosed, specifically in this filter loop in _cast: https://searchfox.org/mozilla-central/rev/a287c0313cea972ea8e71a4d5def3462af3feffa/toolkit/components/extensions/ConduitsParent.sys.mjs#344

What could cause a high number of elements in this array? Can we do something to change what is presumably O(n^2) behavior?

The flame graph shows that the head is mostly in the port filter, and earlier calls show that sendPortDisconnect is the caller. This is not a sign of there being too many ports, or else the profile would also include lots of IPC after that. I also audited all extensions in the profile, and none have excessive number of browser.runtime.connect() or browser.tabs.connect() calls that would create so many ports.

This _cast method is the central implementation to the following parts of the extension messaging APIs:

Although it is commonly expected for there to be only one or at most a few recipients, the generic implementation iterates over all entries in Hub.remotes, which can be potentially many, since each content script that uses extension APIs will have one, and there are extensions in the profile that declare content scripts for each website and frame that the user is visiting). In theory, this logic could be optimized by maintaining separate collections of targets, keyed by portId or extensionId.

However, I'm not sure if optimizing that would meaningfully change the observed behavior here. The browser has been around for less than a day, and even with intensive browsing, I would not expect the Hub.remotes Map to become so large that iterating over its entries (with relatively light checks) would become a performance bottleneck.

The profile shows 70 second jank. In the same profile, of which GCMajor is majorly responsible for it (45 seconds total, accumulated from 0.6s, 1.1s, 0.9s, 13.2s, 1.0s, 13.5s, 15.0s), plus uncountable many other GCMinor in between.

Flags: needinfo?(rob)

Thanks! Is there some code that Urja could run on the Browser Console to find out how big the Hub.remotes Map is?

Maybe the real bug is that we're letting the map grow large to begin with. Could this happen if, let's say, twitch or an add-on creates one iframe per twitch chat message? Maybe there's another factor here that causes these iframes to be kept alive too long.

Flags: needinfo?(rob)

(In reply to Markus Stange [:mstange] from comment #4)

Thanks! Is there some code that Urja could run on the Browser Console to find out how big the Hub.remotes Map is?

This is not possible yet. If the issue is somewhat reproducible, we can consider putting export before Hub to make it easily readable from the Browser Console.

Urja, how often do you come across this issue? Can you consistently reproduce it?

Flags: needinfo?(rob)
Blocks: webext-perf
Severity: -- → S3
Priority: -- → P3
Summary: Closing a long-running twitch tab hangs browser for minute(s) → Closing a long-running twitch tab hangs browser for minute(s) (GC in recvConduitClosed of Conduit)
You need to log in before you can comment on or make changes to this bug.