Closed Bug 1658571 Opened 1 year ago Closed 10 months ago

Many ghost windows with Disconnect starting in Firefox 78

Categories

(Core :: DOM: Core & HTML, defect)

78 Branch
Desktop
All
defect

Tracking

()

VERIFIED FIXED
83 Branch
Tracking Status
relnote-firefox --- 81+
firefox-esr68 --- unaffected
firefox-esr78 --- fixed
firefox80 --- wontfix
firefox81 + verified
firefox82 + verified
firefox83 + verified

People

(Reporter: mick.pearson, Assigned: emilio)

References

(Depends on 1 open bug, Blocks 1 open bug, Regression)

Details

(Keywords: memory-leak, regression, Whiteboard: [fxperf][STR in comment 74])

Attachments

(7 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0

Steps to reproduce:

Upgrade to 78, 79.

Pseudo/earlier report: https://bugzilla.mozilla.org/show_bug.cgi?id=1651416

Current situation: https://www.reddit.com/r/firefox/comments/hlvqap/firefox_78_extremely_slow_graphic_performance/

Actual results:

Firefox explodes, performance is chronic and unstable and unusable. Reports of this cross all operating systems, all GPU and high-end and low-end hardware.

Firefox doesn't offer a path to downgrade to 77. Using 77 is to be constantly harassed everyday. I want Mozilla to give us a build that doesn't harass us, ideally with security patches, and for future incidents a better system for rolling back and a ability to disable its naive strong arm Update harassment policy as it stands.

Most of us can't help with this since we're using 77 and it's prohibitively difficult to switch back and forth. So we need developers to develop and do diagnostics and not ask users to do that for you. In the meantime this ticket is aimed at getting an alternative build for us and better systems in place for when Mozilla releases a bomb to the public in the future.

Expected results:

FF should work like 77. You know, like professional grade software.

Hi, mick.pearson!

Thanks for your feedback!

I'll mark this as an enhancement and assign it to the Toolkit Application Update team.

Regards,

Status: UNCONFIRMED → NEW
Type: defect → enhancement
Component: Untriaged → Application Update
Ever confirmed: true
Product: Firefox → Toolkit

Okay, make FF not explode into shambles is an "Enhancement". There must be some serious cognitive dissonance at Mozilla :(

Hi Mick,

I'm sorry to hear about the problems you are having, but I thank you for taking the time to bring them to my attention. It looks like the main problem you are having is described in Bug 1651416. I'll see if there's anything I can do to get some more attention paid to that.

As for this bug, I'm not exactly sure what you are asking for. Could you be more specific? It sounds like you want something that facilitates downgrades, but I'm not sure specifically what.

Flags: needinfo?(mick.pearson)

The main description explains the bug you refer to isn't the same thing, and that bug was never actually addressed. What the people on Reddit want is to be able to use FF again, since they worry FF's team has lost their mind and will just keep publishing new updates without addressing the bug that makes FF completely unusable. These users aren't being hyperbolic about FF's erratic behavior that likely effects all applications running on the same system because FF is probably reaching limits because it's exhausting the system's resources, so these users are stuck using FF 77.

Please actually read the Reddit thread. Be responsible to your user base. Escalate the matter as it should be. This ticket is to ask for an immediate half measure that should be possible to turnover in short order. I'm really tired of this, and just don't like being reminded of it everyday when my system boots up and FF tells me in multiple ways to update it, when updating will make it explode.

Flags: needinfo?(mick.pearson)

I think I speak for everyone that we also would like an apology for being ignored.

Just signed up an account to post here, followed to this ticket from the reddit post. Having the same problem with too many youtube tabs on 1 window. I had more tabs in v77 and did not have problems.

(In reply to Mick P. from comment #4)

The main description explains the bug you refer to isn't the same thing.

Ah, my mistake. In that case, this bug is not describing an Updater enhancement. It sounds like it's a problem with Graphics.

Type: enhancement → defect
Component: Application Update → Graphics
Product: Toolkit → Core

@Kirk, yes it's describing a CRISIS as the headline makes amply clear, but also I suppose it is requesting an enhancement to the update system, that's needed only because Mozilla releases are failing more dramatically than anyone could have predicted, so we need branching paths to help with Mozilla's rocky release pattern.

Changing update to be less hostile would be a nice side-effect, and I suppose that can be done as afterthought (I describe it as an afterthought) however what this ticket is requesting is a public download file we can use to install FF 77 with cherry picked security updates until Mozilla solves this in some way or another. (And obviously inherent in this is that the originating issue will be solved.)

I want to add, the original post isn't as clear on the status of the old ticket as I recalled. You do have to dig into the links. I apologize for that and I can see it's not explicitly said (for sake of brevity, for fear of burying the lead) in the original description.

Mick P, thank you for the report. Could you attach your about:support information to this bug? Thank you.

Flags: needinfo?(mick.pearson)
Whiteboard: [fxperf]
Attached file about:support

Attaching by request

Flags: needinfo?(mick.pearson)

performance issue seems to be removed after upgrading to version 80

(In reply to Mick P. from comment #12)

Created attachment 9172052 [details]
about:support

Attaching by request

Mick P, thank you! It was for Firefox 77. Can you also attach about:support of 78 or 79? It might have information about why performance is bad. And can you also check the performance of Firefox 80 or nightly?

And is it possible to update graphic driver to recent one? It seems a bit old.

Flags: needinfo?(mick.pearson)

(In reply to Mick P. from comment #12)

Created attachment 9172052 [details]
about:support

Attaching by request

Compositor was "Direct3D 11 (Advanced Layers)". When WebRender is enabled, performance might becomes better. It could be enabled by setting "gfx.webrender.all=true" in about:config. Current compositor could be checked at about:support.

On intel gen7.5 gt3, WebRender is not enabled yet. It is enabled since Firefox80 on Win10 desktop PC(Bug 1651172) and will be enabled from Firefox81 on Win10 laptop PC(Bug 1654262).

I just volunteered to make a ticket(In reply to xiwizv from comment #13)

performance issue seems to be removed after upgrading to version 80

We thought that for 79 too. Either it doesn't effect dev builds or it takes about a week to fully set in. For this reason I think the problem is probably in a caching file that is deleted on update and builds up into a database that degrades as it grows in size, since this is persistent. You can read about these theories in the Reddit topic. But from what I've seen there I think this problem exists for everyone in the world and that Mozilla should not be sitting on its hands, however authority is distributed in its organization.

Flags: needinfo?(mick.pearson)

(In reply to Sotaro Ikeda [:sotaro] from comment #15)

(In reply to Mick P. from comment #12)

Created attachment 9172052 [details]
about:support

Attaching by request

Compositor was "Direct3D 11 (Advanced Layers)". When WebRender is enabled, performance might becomes better. It could be enabled by setting "gfx.webrender.all=true" in about:config. Current compositor could be checked at about:support.

On intel gen7.5 gt3, WebRender is not enabled yet. It is enabled since Firefox80 on Win10 desktop PC(Bug 1651172) and will be enabled from Firefox81 on Win10 laptop PC(Bug 1654262).

You need to read the Reddit topic before discussing the issue. It covers these things. I just volunteered to submit a ticket since no one else was doing it and I had an account and I have a technical background, but don't look at me as the face of this problem. The balls in Mozilla's court and I think others have already uncovered more useful information that's laid out in the links provided in the main description.

(Anyway, WebRender isn't the issue. If the problem is in it, then it's affecting everyone whether their system used it in 77 or not. I can't submit more info and I don't think about:support will change between 77 and 78 but I'm not going to install those versions of FF to get that for you. You guys--or someone--are the developers, I am a developer, so I know what your responsibilities are and how you should be proceeding and that involves exploring the changes made in the identified build and doing tests, so don't ask us to reinstall FF unless you have a special build you want everyone to try. You need to identify what change killed FF.)

(In reply to Sotaro Ikeda [:sotaro] from comment #14)

(In reply to Mick P. from comment #12)

Created attachment 9172052 [details]
about:support

Attaching by request

Mick P, thank you! It was for Firefox 77. Can you also attach about:support of 78 or 79? It might have information about why performance is bad. And can you also check the performance of Firefox 80 or nightly?

And is it possible to update graphic driver to recent one? It seems a bit old.

It's affecting everyone with new and old adapters on all operating systems, with high end and low end software. Is nobody home at Mozilla because of the virus? Really guys I don't want to be rude (but I will be) please read the topics. (It's not about my system in particular. Hundreds if not millions of people are experiencing this.)

Severity: -- → S3

Compositor was "Direct3D 11 (Advanced Layers)". When WebRender is enabled, performance might becomes better. It could be enabled by setting "gfx.webrender.all=true" in about:config. Current compositor could be checked at about:support.

On intel gen7.5 gt3, WebRender is not enabled yet. It is enabled since Firefox80 on Win10 desktop PC(Bug 1651172) and will be enabled from Firefox81 on Win10 laptop PC(Bug 1654262).

I'm pretty sure my "gfx.webrender.all=false", after I updated to FF80. And I have a Win10 PC.
I made a manual change to toggle it to true. Not sure if it helped with the performance, since I did not do a test comparison with the two settings.
At least since the update, 200+ tabs have not slow down on youtube or any graphic heavy sites.

Sorry, by "high end and low end software" I meant hardware naturally. On WebRender someone in the Reddit topic did tests with it and found no changes, so I think that would rule out the bug being inside WebRender code. Everyone experiencing the problem still in 78 is reporting very specific experiences, so I think it's safe to say we're all experiencing the same thing, and someone said it's on all of their systems including several form factors, and someone said their neighbor or friend was experiencing it too. So these aren't isolated incidences, I think that it's either just masked under certain circumstances or your average user doesn't notice the effects (hard to imagine in severe cases) or will just stop using FF unceremoniously (seems more likely.)

I think it's a real possibility that something in dev (nightly) builds masks it since someone among the devs surely would've reported it otherwise. I would look at caching settings. Not necessarily the web cache but other kinds of caches, since the problem seems to snowball across sessions.

This bug is kind of a mess. I've marked comment 6 and comment 9 obsolete because the profile links posted there were for profiles that weren't published, so nothing is accessible. Too much of the relevant information is spread out over too many comments, and over a thread on Reddit. I'm going to try to consolidate it here in this comment.

First though, I'm going to respond to comment 0 by the reporter:

In the meantime this ticket is aimed at getting an alternative build for us and better systems in place for when Mozilla releases a bomb to the public in the future.

You're welcome to use the ESR build, which releases and updates at a slower pace. It's clear from your comment 0 that you're really really upset. We're going to try to get to the bottom of this, but the more actionable data we can gather in one place, the better.

Here are the profiles from the Reddit thread:

This is the most actionable information in there. Most of the rest of it are suggestions for using mozregression, and comments echoing that they're having similar issues. Unfortunately, while those comments certainly bolster the case that there's a real problem here, it doesn't give us actionable data to help us reproduce or diagnose.

So we'll work with these profiles and see what we can get.

The pattern I'm seeing in these profiles is long garbage-collection/cycle-collection pauses. Something is creating a lot of objects on the heap, which is taking the main thread a long time to traverse for mark and sweep.

My first hypothesis here is that one of the user's WebExtensions is injecting itself into certain web pages and creating a lot of garbage. My recommendation for people who are able to reliably reproduce this is to test disabling their add-ons one by one and seeing if the problem goes away. If my hypothesis is correct, and a WebExtension is at fault here, if we can identify it, there's a much higher chance that we can reproduce the problem in the lab and figure out a solution. That solution might be outreach to the WebExtension author, or fixes to our WebExtension API surface.

Flags: needinfo?(mick.pearson)

Hi Josh,

Yes, this appears to be a similar problem as the other profiles that I mentioned in comment 26: long CC/GC pauses.

I'm curious - if you (temporarily) disable your add-ons and restart the browser, does the problem recur?

Flags: needinfo?(Josh)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #29)

Hi Josh,

Yes, this appears to be a similar problem as the other profiles that I mentioned in comment 26: long CC/GC pauses.

I'm curious - if you (temporarily) disable your add-ons and restart the browser, does the problem recur?

Hi Mike,

Im not against trying that from past experience the issue seems to take a few days to build up. So if I disable all the addons it might not happen for a week or so. I can still attempt to try this if you think it will be helpful though.

Basically any time I restart firefox its good for atleast a day or two sometime its been as long as a week.

Flags: needinfo?(Josh)

(In reply to Josh Gold from comment #30)

Hi Mike,

Im not against trying that from past experience the issue seems to take a few days to build up. So if I disable all the addons it might not happen for a week or so. I can still attempt to try this if you think it will be helpful though.

Basically any time I restart firefox its good for atleast a day or two sometime its been as long as a week.

What you're describing and what these profiles are showing sounds more and more like a slow leak over time.

I think it would be very helpful to confirm or refute the add-on hypothesis, but considering how much time it takes to accumulate, we might want to try an educated guess on which add-ons we should disable - rather than bisecting and doing this week after week.

This is the list of add-ons from the profile:

    Cookie Quick Manager
    Tracking Token Stripper
    Stylus
    Temporary Containers
    HTTPS Everywhere
    Tab Session Manager
    Buster: Captcha Solver for Humans
    Behind The Overlay Revival
    Firefox DevTools ADB Extension
    Wikipedia (en)
    Google
    Privacy Badger
    minerBlock
    Facebook Container
    Disconnect
    Tab Suspender (memory saver)
    YouTube Ad Auto-skipper
    Dark
    DoH Roll-Out
    WX Download Status Bar
    Amazon.com
    Firefox Screenshots
    DuckDuckGo
    ClearURLs
    Disable WebRTC
    Bing
    Theme Font & Size Changer
    Form Autofill
    Web Compat
    eBay
    Firefox Multi-Account Containers
    YouTube High Definition
    KeePassXC-Browser
    MetaMask
    uBlock Origin
    Privacy Possum
    ProtonMail (unofficial)
    Form History Control (II)
    Polisis

We can eliminate the search engine ones and built-in ones for now, that leaves us:

    Cookie Quick Manager
    Tracking Token Stripper
    Stylus
    Temporary Containers
    HTTPS Everywhere
    Tab Session Manager
    Buster: Captcha Solver for Humans
    Behind The Overlay Revival
    Firefox DevTools ADB Extension
    Privacy Badger
    minerBlock
    Facebook Container
    Disconnect
    Tab Suspender (memory saver)
    YouTube Ad Auto-skipper
    Dark
    WX Download Status Bar
    ClearURLs
    Disable WebRTC
    Theme Font & Size Changer
    Firefox Multi-Account Containers
    YouTube High Definition
    KeePassXC-Browser
    MetaMask
    uBlock Origin
    Privacy Possum
    ProtonMail (unofficial)
    Form History Control (II)
    Polisis

Do you know if there are overlaps from this list from other people who are experiencing the same issue as you?

Flags: needinfo?(Josh)

After 6 days of FF80, the performance decline is starting to kick in. Decided not to update to 80.1 until I can provide these data to help find the culprit of these performance issues

Here is my about:support:
https://bin.privacytools.io/?8ed8f75f025b52fd#LzUv5oTZAs77xPA7XznAFhYr70BoBWDXjmR8fzmPKGk=

Here is my profiler results:
https://share.firefox.dev/2YSF0Aw

Not sure if I did these right.
Let me know if I need to select alternative settings.

Sadly firefox will not restore my tabs properly without the tab session manager so I basically did the same thing but manuelly. Ive disabled all my addons but tab session manager and will see if the issue occurs. If it does then we can say its either firefox or this one extension and atleast that narrows it down a bit.

If it doesnt then I can keep adding extension back one at a time to see when it starts.

Flags: needinfo?(Josh)

Interestingly, Tab Session Manager is one of the WebExtensions I would have suggested disabling - both you and xiwizv have it enabled. You have some other common extensions enabled as well:

  • uBlock Origin
  • Facebook Container
  • Privacy Badger
  • HTTPS Everywhere

(In reply to Josh Gold from comment #33)

Sadly firefox will not restore my tabs properly without the tab session manager

What does Firefox do instead of restoring your tabs properly? I presume upon restart, you go to History > Restore Previous Session, and something goes wrong?

Flags: needinfo?(Josh)

After some messing around it was a combination of 2 issues tab session manager and tab suspender. I needed the suspender to unsuspened tabs that were saved as suspended. In any case so far its been running for about 8 hours with just those 2 extension enabled without any issue. I will probably give it 2 weeks before saying its neither of them.

If the issue does occur though Ill take another profile snapshot and post it here.

Flags: needinfo?(Josh)

I cleaned up some the add-ons, but still have performance issues. Not sure if disabling the addons count as removing them from the possibility of creating problems.
https://share.firefox.dev/3hW9w3R
https://bin.privacytools.io/?05a4bc440507be59#9j5KxGmLjmBEhst/GcyZQKGeOuO7y3+e2T2W+Dso7Uw=

(In reply to Josh Gold from comment #30)

Im not against trying that from past experience the issue seems to take a few days to build up. So if I disable all the addons it might not happen for a week or so. I can still attempt to try this if you think it will be helpful though.

Basically any time I restart firefox its good for atleast a day or two sometime its been as long as a week.

In my experience restarting FF doesn't change the misbehavior. I think that's the general consensus on Reddit, but after an update the behavior goes away and takes a few days to gradually built up until it becomes bad enough to need to revert to 77.

If you're experiencing the same thing it suggests your FF is doing something during start up that cleans it out so-to-speak, that is normally only being done upon upgrading/downgrading. I think looking at anything FF does during an upgrade to reset itself would provide insight, and that can be cross-referenced with the particular built that a Redditor said they think isolates the misbehavior. You can get that build number in the links.

If (In reply to Josh Gold from comment #30)

Im not against trying that from past experience the issue seems to take a few days to build up. So if I disable all the addons it might not happen for a week or so. I can still attempt to try this if you think it will be helpful though.

Basically any time I restart firefox its good for atleast a day or two sometime its been as long as a week.

In my experience restarting FF doesn't change the misbehavior. I think that's the general consensus on Reddit, but after an update the behavior goes away and takes a few days to gradually built up until it becomes bad enough to need to revert to 77.

If you're experiencing the same thing it suggests your FF is doing something during start up that cleans it out so-to-speak, that is normally only being done upon upgrading/downgrading. I think looking at anything FF does during an upgrade to reset itself would provide insight, and that can be cross-referenced with the particular built that a Redditor said they think isolates the misbehavior. You can get that build number in the links.

I have very minimal add-ons, but if removing add-ons did solve the problem (is there an easy way to try this?) I'm confident the add-ons aren't to blame, but they just happen to be doing something legit that's broken in 78 if so.

Flags: needinfo?(mick.pearson)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #26)

This bug is kind of a mess. I've marked comment 6 and comment 9 obsolete because the profile links posted there were for profiles that weren't published, so nothing is accessible. Too much of the relevant information is spread out over too many comments, and over a thread on Reddit. I'm going to try to consolidate it here in this comment.
You're welcome to use the ESR build, which releases and updates at a slower pace. It's clear from your comment 0 that you're really really upset. We're going to try to get to the bottom of this, but the more actionable data we can gather in one place, the better.

We couldn't use the ESR system because Mozilla baffling released 78 for ESR and regular users at the exact same time, nullifying any benefit from ESR.

(Can someone edit my previous post's obvious mistakes? and change built->build for clarity?)

"This bug is kind of a mess" yes it is! I don't think you're going to find anything blaming add-ons. It's a knee-jerk response. I think the problem is unlikely isolated to certain add-on profiles. And if so an API that add-ons rely on is broken, like a database regressed to bad performance. If an add-on was to blame then downgrading to 77 wouldn't fix the problem unless the add-on actually followed suit and downgraded itself.

Interesting I must have missed the part about it being only on an update its fixed. For me a restart always resolved the problem atleast for a little bit. Im not sure if this is an addon or a firefox setting I have but when I restart my firefox and restore my tabs it doesnt load them all instantly it only loads the active one so perhaps that is why I dont experience it right away?

In any case from my side been running for about a day and havent hit the issue yet. I will say(could just be me imagining it), but it does seem like its slowing down as I keep it open. It used to change taabs instantly now it feels just barely delayed.

For the record, I run FF at startup everyday, so I'm not using it for days on end (I used to before SSD drives made hibernation a thing of the past) and I have 3 windows with hundreds of tabs. They don't load until you click on one or if it was the last viewed tab. A lot of the Reddit users have claimed they use lots of tabs. I assume everyone does that unless they're a grandma, but I don't know, maybe most people don't. It's possible that is a way to provoke the misbehavior so-to-speak. But when it builds up, over a few days, Quitting FF and turning it back on (running FF) doesn't affect the misbehavior. Only updating (reinstalling) sets it back so it takes a few days to become noticeable/unbearable.

I think that's most people's experience going by reactions on Reddit after everybody tried to use 79 after it was released and were initially optimistic. My nontrivial add-ons are called AdBlocker For YouTube, Disconnect, uBlock Origin. New versions of FF have been so hostile to add-ons there isn't much add-ons can do anymore! I have a few more that are not always-on style add-ons, hard to imagine any of them being significant factors.

My problems do not go away from quitting FF. Upgrading from 79 to 80 does.

Personally I dont use many times. I might have a few days where I have 40-100 open but typically Ill try to keep my tab count down to around 30-50 all in one window.

Its been a few days for me with extensions disabled and no issues so far.

I've had performance issues over the last several releases (including 80), and captured a few profiles over that time just out of curiosity. This most recent one is representative of what I've seen in other cases when the browser has gone nonresponsive: https://profiler.firefox.com/public/8bgmnhvzp3j57ehpn8pfeqwt3c7w363skrh3v6g

This seems to correlate with when I'll see a Web Content process get stuck at 100% CPU for a while (or forever). If I'm reading that right, it seems unexpected that a callstack starting more or less with swapping bytes should be so expensive.

(In reply to xiwizv from comment #41)

My problems do not go away from quitting FF. Upgrading from 79 to 80 does.

Please don't say that! the Mozilla devs will close this ticket as soon as anyone makes such claims. (They did the last one.) If it clears up the Reddit topic should fill up with reports, but we should give it a few weeks after a new release is pushed onto the public channel.

Let me rephrase, upgrading from 77 to 78 or 78-79 or 79 to 80, the problems go away briefly, then starts again in 1-2 days of usage.
I gave up on waiting BTW, already downgraded to 77. Will just keep waiting until this bug is fixed.

I am facing the same issue with 2 of my laptops. At this point Firefox has become unusable and I've switched to a different browser. Everything is lagging or taking ages to render. When I updated to version 80.0.1 the problem disappeared for maybe a day or two, but now it's back to constant lagging and rendering issues like many people already stated.

Same issue here, using Firefox 80.0.1 on OS X 10.15.6.

Everything is fine after a firefox restart. But after a few hours of use, it slows down to the point of becoming unusable. I tried with all plugin disabled, I tried with WEBRENDER enabled or not. This slowness comes back every time.

Problem appeared with version 78 too.

These descriptions ("fine for a while, and then slowly becoming more and more sluggish") sounds like a memory leak to me.

For any of the users that are experiencing this - next time Firefox gets sluggish, can you please go to about:memory, click "measure and save", and then attach it to this bug? That might help confirm the memory-leak hypothesis, and give us an area to focus on to actually try to address this.

And while about:memory has the ability to anonymize your memory report, doing so scrubs pretty useful information from the report, so while I can look at anonymized memory reports, the most useful ones will be the ones that aren't anonymized.

Why is Mozilla sitting on its hands?

We're waiting on more information to make an informed prioritization decision. Specifically, we're waiting on a memory report requested in comment 48 and comment 49 of this bug from one of the affected users so that we can understand the scope and user impact of any such leak. At that point, we can make a prioritization decision.

Attached file memory-report.json.gz

This issue has also been affecting me severely. I find that the browser rendering becomes sluggish after a few hours of browsing content-heavy pages, but not necessarily with tons of tabs. Very different from my experience in the past. FF will be using about 80% of my RAM (16 gigs). When it gets in this sluggish state, it will ramp up the CPU about every 30 seconds, for 5-10 seconds, presumably to do GC or something, during which time the webpage is unresponsive. I can type or click, and those inputs will register after the page snaps out of its frozen state. I'm not great with application programming but I can give logs and poke things if you tell me what to do.

Sorry for double posting. I'd just like to add that for people like me who are affected, this is a breaking bug. I have to quit and reopen in order to get a functional browser again. I'm currently debating what browser to switch to, because I count on having several tabs open all day for school. When the browser starts to slow down, it takes twice as long to do normal tasks. I have a quad core i7, 16g of RAM and an SSD, so it hurts to see firefox crawl at the speed of a TI-83.

Thank you, taiteclark1. Comment 52 contains really useful data. From your memory report, it looks like you're accumulating "ghost windows" - aka, full web page windows continue to exist in the background instead of getting cleaned up. This causes the heap to accumulate with more and more items, and so cycle collection pauses get longer as it scans it. Thank you for this, your comment 52 and attachment was very helpful.

Hey mccr8, what's the best way for taiteclark1 to diagnose this ghost window problem? I'm looking at their extension list, and I'm not seeing anything obvious. Any suggestions?

Flags: needinfo?(continuation)

I'm slightly suspicious this might be triggered by the Disconnect extension. Other affected users also seem to have it installed, and I see someone posted a comment in a review saying that since Firefox 78, it was preventing memory getting freed.

taiteclark1 (or indeed other users who are seeing this problem), would you be able to try uninstalling Disconnect (at least temporarily) to see if that makes a difference? Thanks.

Flags: needinfo?(taiteclark1)
See Also: → 1651416

I've read through the comments here and on Reddit, and it does sound like leaking windows (ghost windows), which is what is showing up in the about:memory report. For anybody who is experiencing this issue, if they want to see if they are experiencing ghost windows without waiting until they build up so much that it brings your system to a halt, they can go to about:memory, click on "minimize memory usage", wait for that to finish, then click on "measure". In the resulting report, you can search for ghost-windows, and see if there are any entries that are non-zero.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #54)

Hey mccr8, what's the best way for taiteclark1 to diagnose this ghost window problem? I'm looking at their extension list, and I'm not seeing anything obvious. Any suggestions?

It could be anything, really. Looking at bug 1651416 comment 7, somebody says they found a regression range for the issue. There aren't really all that many changesets in there, once you ignore the wpt stuff. Baku has some cookie-related changes, which is suspicious given that there are a few cooking and tracking related addons in that list, but as far as I can tell his patches were backed out and so they wouldn't actually be in the build. It still might be worth looking at cookie related addons.

There's also a number of Sync changes in that range, so maybe it could be related to that?

Blocks: GhostWindows
Component: Graphics → DOM: Core & HTML
Flags: needinfo?(continuation)
Keywords: memory-leak
Summary: FF 78 79 pathological performance, users need rollback to 77, alt release now → Many ghost windows starting in Firefox 78

Comment 35 says they stopped having the issue when they disabled "tab session manager and tab suspender."

(In reply to Andrew McCreight [:mccr8] from comment #57)

Comment 35 says they stopped having the issue when they disabled "tab session manager and tab suspender."

For the record it's rushing to conclusions to change the title here, etc. There could easily be a ticket made to address "ghost windows" (less baggage that way) and then see if that changes the situation here. But I didn't notice Josh Gold's report that he thinks it cleared up, or the subsequent follow-ups. That's the first that registered for me. If ghost windows can be catastrophic that's probably what it is, but as a matter of best practices and optics I would wait for a confirmation to shift gears.

Also we don't appreciate this ordeal, and I for one still feel that Firefox is totalitarian on the forced updates front, that should be addressed separately.

(In reply to Andrew McCreight [:mccr8] from comment #56)

It could be anything, really. Looking at bug 1651416 comment 7, somebody says they found a regression range for the issue. There aren't really all that many changesets in there, once you ignore the wpt stuff. Baku has some cookie-related changes, which is suspicious given that there are a few cooking and tracking related addons in that list, but as far as I can tell his patches were backed out and so they wouldn't actually be in the build. It still might be worth looking at cookie related addons.

Someone (devs) should try to simulate the experience based on the memory profile (if an add-on is the catalyst install the add-ons in a test bed as necessary.) I suspect that "regression range" will prove accurate. Bugs can be wile. I think the person who did all the work to install each build and test its UX deserves praise and has been ignored instead.

(Meant to type *wily, meant to look up spelling. Sorry, hate to amend a comment this way.)

I uninstalled Disconnect and it's been fine all day. I haven't been stressing it a lot, though. If the problem doesn't come back in the next few days I would blame that extension. In the meantime, maybe people can install Disconnect and see if it does anything unusual. Also, there may be other extensions that reveal the same problem.

Flags: needinfo?(taiteclark1)

Disabling Disconnect did not solve the problem for me.

I just now did the minimize & measure suggested in comment 56. The main process had 0 ghost windows, and the web processes had 29, 23, 31, 34, 19, 10, 23, 20 ghost windows respectively. Meanwhile, at any given time 1-3 of those processes is generally using 100% CPU.

(In reply to Marshall from comment #61)

Disabling Disconnect did not solve the problem for me.

Could you please attach your about:support in a file to this bug so that I could see a list of your addons? The most likely cause here is some bad interaction with an addon.

Attached file about-support.json

Attached about:support data.

I installed the set of addons in comment 12, and I'm able to reproduce some leaking behavior. Not everything is showing up as a ghost windows, but tabs are still alive long after they were closed.

Summary: Many ghost windows starting in Firefox 78 → Many ghost windows with addons starting in Firefox 78

The leaking windows are being held alive by missing references to script elements, which kind of makes sense with the way script blocking addons seem common for people having issues.

Assignee: nobody → continuation

I have some steps to reproduce a leak. I haven't completely tried to minimize the steps involved.

  1. Create a new profile, install Disconnect. Restart the browser.
  2. Open a simple web page, like news.ycombinator.com in 8 or more tabs. This will keep the processes alive.
  3. Open a new tab, load rei.com by typing it into the address bar. Click around to navigate to different pages on the site, maybe 8 or so times.
  4. In the same tab, load cnn.com by typing it into the address bar. Click around to navigate to different pages on the site, maybe 8 or so times.
  5. Close the cnn.com tab. Open about:memory in a new tab. Click on minimize memory usage, wait a few seconds, then click on measure. Do "find" in the page with Ctrl-F or whatever. rei.com should not be present in the memory report. In the leaking state, they are.
Whiteboard: [fxperf] → [fxperf][STR in comment 67]

I ran mozregression with the steps I gave in the previous comment and it turned up this regression range for June 4 containing bug 1640135: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e9ba31526222335ae6549998b6958f33a8fdb798&tochange=f318f0c9b8f38f9e69eaa587d855f147ecff3e78

I noticed that in the "good" states that Disconnect didn't actually have any little numbers on its icon indicating that it was blocking anything. Therefore, I'm not sure if Disconnect simply doesn't work at all without that changeset (and it always leaks when it blocks something), or if some API was added in that changeset that leaks.

The changes in bug 1640135 seem to involve only strings, so I don't know how that could directly cause leaks.

Gary, do you know why bug 1640135 might have made Disconnect go from not blocking things to blocking things? The addon looks like it hasn't been updated for a year. Thanks.

Flags: needinfo?(xeonchen)

Using DMD heap scan mode, it looks like the reference that is keeping the script element alive might be ScriptLoadRequest::mFetchOptions::mElement. The ScriptLoadRequest is being held by a nsMainThreadPtrHolder created in PreloaderBase::RedirectSink::RedirectSink, which is being held alive by a RedirectSink created in PreloaderBase::NotifyOpen(). That looks like it is being held alive by some kind of about:blank channel. The heap scan scripts can't find any references to the channel for some reason. Maybe this is another case where a channel holds a redirect callback due to an addon, and then we fail to do a cancel or whatever on the channel in some error case, and we create a leak?

Apologies for not being clear when I said "After some messing around it was a combination" I was referring to my tabs not restoring properly not to the problem being fixed.

Ive been running firefox for about 2 weeks sadly it restarted half way through but besides for that its been running more or less fine. I enabled one new extension yesterday and will see how it acts for the week and if no issues will keep repeating until I notice it acting slow.

I can reproduce this bug.

Steps to reproduce:

  1. Go to about:preferences#general.
  2. Disable Use recommended performance settings.
  3. Set Content process limit to 1.
  4. Install Disconnect.
  5. Restart Firefox.
  6. Open http://example.com in first tab.
  7. Open rei.com in second tab.
  8. Close the second tab.
  9. Open about:memory in a new tab.
  10. Click on Minimize memory usage.
  11. Wait a few seconds, then click on Measure.
  12. Search for rei.com in the page.

rei.com should not be present in the memory report.

That regression range makes more sense than the one I found. There's a lot of stuff in there relating to preloading, and some of the data structures involved in the leak have preloader in the name, as I said in comment 71.

Jens, do you know who might be able to look at this? There's a severe leak with a certain addon installed that looks related to networking channels. The leak looks like it might be a regression from some preloading work that landed in 78. Thanks.

Flags: needinfo?(jstutte)

(In reply to blinky from comment #75)

Regression range:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=515f4054b1ccaa49b1f25f70f4e3a6f610184690&tochange=944ce5e286edeeedf0b54e0b50d4cc48ff63f12f

According to this regression range, looks like it's unlikely related to bug 1640135.
I'm going to cancel the needinfo flag.

Flags: needinfo?(xeonchen)

From comment 71 and looking into the code I stumbled over bug 1501608, which seems to directly impact the way those references are handled and whose patch landed May 13, just two days after the regression range given in comment 75.

Denis, Olli, any thoughts?

Flags: needinfo?(jstutte)
Flags: needinfo?(dpalmeiro)
Flags: needinfo?(bugs)

If I use the steps in https://bugzilla.mozilla.org/show_bug.cgi?id=1658571#c74 then this reproduces even earlier than bug 1501608. The regression range for me using those steps is between 2020-05-10 and 2020-05-11. Unfortunately, mozregression can't find any builds in between. I will try to bisect this further by hand to help find a culprit.

Flags: needinfo?(dpalmeiro)
Flags: needinfo?(bugs)

[Tracking Requested - why for this release]: We've been shipping this for a few releases, so it might not be the most widespread problem ever, but the impact for people who are affected (people running the Disconnect addons, and possibly other addons) is very severe, and the volume of comments in here, the other bug and the Reddit thread linked in comment 0 are some of the higher I've seen for any regression, so it would be good if we could fix this.

Assignee: continuation → nobody

The STR in comment 74 seems better than the one I came up with, at least in terms of not having Disconnect just break at some point in the regression process. (At least, I assume it isn't.)

Whiteboard: [fxperf][STR in comment 67] → [fxperf][STR in comment 74]

Using the steps in https://bugzilla.mozilla.org/show_bug.cgi?id=1658571#c74, I was able to bisect this down to the following changeset:

o changeset: 529112:a7541ff8e630
| date: Mon May 11 14:07:24 2020 +0000
| summary: Bug 1618292 - Make ScriptLoadRequest derive and use PreloaderBase to support new preload as speculative load feature, r=smaug

Regressed by: 1618292

Probably outside the scope of a dot release unless the fix is super well-scoped and low risk, but I'll track it for 81 just in case.

See Also: → 1359201

I'm somewhat familiar with the preloading stuff, if this is needed I can look into it, though I'm busy with other stuff atm...

Given the regression range, I bet what's going on is that the relevant ScriptLoadRequest keeps a reference to the channel via PreloaderBase::mChannel, and the disconnect add-on causes us to never hit PreloaderBase::NotifyStop, which is what clears it out. Or this condition is not holding in some other way and we incorrectly early-out from NotifyStop.

Not having dug into the root cause yet, a potential fix that future-proofs this code from these kinds of mistakes could be something like iterating the preloads and calling NotifyStop(NS_ERROR_ABORT) on them in ClearAllPreloads...

I'd guess there are more addons affected than Disconnect, but I'll add it to the summary to be more specific.

Summary: Many ghost windows with addons starting in Firefox 78 → Many ghost windows with Disconnect starting in Firefox 78
Duplicate of this bug: 1665533

Valentin, do you have time to look at this? If I recall correctly, you've previously fixed an issue with callbacks not getting cleared from a channel. Thanks.

Flags: needinfo?(valentin.gosu)

I took a look at this and I think I found a fix. I think the bad cycle is related to the RedirectSink code, so the underlying bug may be in Necko, but it should be trivial to work around.

I'm digging a bit more, but this is the patch if someone wants to play with it: https://hg.mozilla.org/try/rev/b34483aa802d41b70657ee024e8dc71850a96e87

PreloaderBase -> RedirectSink -> PreloaderBase is a strong,
non-cycle-collected reference cycle, which in cases where we don't drop
the channel because we never get an NotifyStop notification, it can
cause leaks.

I'm investigating the root cause of the lack of NotifyStop, but this
should fix the leak and is correct anyhow.

Move the class to the cpp file to ease debugging and changes.

Assignee: nobody → emilio
Status: NEW → ASSIGNED

I went ahead and sent the patch anyways, as it is correct and fixes the leak. Will keep digging on why we don't drop the channel otherwise in NotifyStop as that can be a correctness issue too.

So, I found some leaking channels, there's more than one, but the one I caught is from a Link header for a font:

Link: <https://satchel.rei.com/media/font/Graphik/Graphik-VF/Graphik-VF-Web.woff2>;rel=\"preload\";as=\"font\";type=\"font/woff2\";crossorigin,<https://satchel.rei.com/media/font//Graphik/Graphik/Graphik-Semibold...

The FetchPreloader is a bit different but uses the same mechanism as the script one so it's not too surprising.

It gets its NotifyStop method called, but with the original channel, and presumably there's a redirect involved so mChannel ends up set to something different.

Ah, no, that's wrong, we just keep mChannel being non-null intentionally: https://searchfox.org/mozilla-central/rev/3d53187b90605ccef321c9cadbba762ad06a6381/uriloader/preload/FetchPreloader.cpp#208-212

So the assertion I tried to add to catch the broken case is not quite right then, we can get dropped with a non-null reference to a channel... So the issue is that necko isn't dropping the callbacks, and I need to improve my assertion :-)

Per matrix discussion, this is safer to uplift (instead of relying on
the RedirectSink being released on the main thread).

Depends on D91259

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/22bfabd8ad78
Don't create a reference cycle when catching redirects of a preload. r=smaug
https://hg.mozilla.org/integration/autoland/rev/8572515cd187
Add MainThreadWeakPtr and use it in PreloaderBase::RedirectSink. r=sg,smaug
Pushed by emilio@crisal.io:
https://hg.mozilla.org/integration/autoland/rev/fc347ef36a1b
Add a missing "explicit" in a test.

So I dug a bit more in order to try to fix this for good. The extra reference comes from here:

#0  mozilla::PreloaderBase::RedirectSink::AddRef() (this=0x7fe2300672e0) at /home/emilio/src/moz/gecko/uriloader/preload/PreloaderBase.cpp:56
#1  0x00007fe2589fb6d5 in nsCOMPtr<nsIInterfaceRequestor>::assign_with_AddRef(nsISupports*) (this=0x7fe230014c08, aRawPtr=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:1178
#2  nsCOMPtr<nsIInterfaceRequestor>::operator=(nsIInterfaceRequestor*) (this=0x7fe230014c08, aRhs=0x7fe2300672e0) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsCOMPtr.h:691
#3  0x00007fe2589f7060 in nsBaseChannel::SetNotificationCallbacks(nsIInterfaceRequestor*) (this=0x7fe230014ae0, aCallbacks=0x7fe2300672e0) at /home/emilio/src/moz/gecko/netwerk/base/nsBaseChannel.cpp:535
#4  0x00007fe258d998ea in mozilla::net::HttpBaseChannel::SetupReplacementChannel(nsIURI*, nsIChannel*, bool, unsigned int)
    (this=0x7fe231741830, newURI=0x7fe23004d8b0, newChannel=<optimized out>, preserveMethod=<optimized out>, redirectFlags=2) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpBaseChannel.cpp:3819
#5  0x00007fe258da349d in mozilla::net::HttpChannelChild::SetupRedirect(nsIURI*, mozilla::net::nsHttpResponseHead const*, unsigned int const&, nsIChannel**)
    (this=0x7fe231741800, uri=0x7fe23004d8b0, responseHead=0x7fe233c0d1e8, redirectFlags=@0x7fe233c0d09c: 2, outChannel=0x7ffc79011160) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1309
#6  0x00007fe258da3b10 in mozilla::net::HttpChannelChild::Redirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTSubstring<char> const&, unsigned long const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=0x7fe231741800, registrarId=@0x7fe233c0d008: 52, newOriginalURI=..., newLoadFlags=@0x7fe233c0d098: 262144, redirectFlags=@0x7fe233c0d09c: 2, loadInfoForwarder=..., responseHead=..., securityInfoSerialization=..., channelId=@0x7fe233c0d2b0: 0, timing=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1358
#7  0x00007fe258db53f2 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30::operator()() const (this=<optimized out>)
    at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1271
#8  std::__invoke_impl<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(std::__invoke_other, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__f=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
#9  std::__invoke_r<void, mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&>(mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30&) (__fn=...) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110
#10 std::_Function_handler<void (), mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)::$_30>::_M_invoke(std::_Any_data const&) (__functor=...)
    at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291
#11 0x00007fe258d3ae5e in mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) (this=0x7fe235aac280, aCallback=<optimized out>, aAssertionWhenNotQueued=false)
    at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/mozilla/net/ChannelEventQueue.h:240
#12 0x00007fe258da31b6 in mozilla::net::HttpChannelChild::RecvRedirect1Begin(unsigned int const&, mozilla::ipc::URIParams const&, unsigned int const&, unsigned int const&, mozilla::net::ParentLoadInfoForwarderArgs const&, mozilla::net::nsHttpResponseHead const&, nsTString<char> const&, unsigned long const&, mozilla::net::NetAddr const&, mozilla::net::ResourceTimingStructArgs const&)
    (this=<optimized out>, aRegistrarId=<optimized out>, aNewUri=..., aNewLoadFlags=@0x7ffc79011540: 2030108672, aRedirectFlags=@0x7ffc79011450: 2030114864, aLoadInfoForwarder=..., aResponseHead=..., aSecurityInfoSerialization=<gNullChar> "", aChannelId=@0x7ffc79011808: 0, aOldPeerAddr=..., aTiming=...) at /home/emilio/src/moz/gecko/netwerk/protocol/http/HttpChannelChild.cpp:1267
#13 0x00007fe2591ae9af in mozilla::net::PHttpChannelChild::OnMessageReceived(IPC::Message const&) (this=<optimized out>, msg__=...) at PHttpChannelChild.cpp:644
#14 0x00007fe2590c91f0 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7fe2634ca020, msg__=...) at PContentChild.cpp:8656
#15 0x00007fe258fe7763 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) (this=0x7fe2634ca118, aProxy=0x7fe2449e5440, aMsg=...)
    at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2150
#16 0x00007fe258fe6596 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7fe2634ca118, aMsg=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:2074
#17 0x00007fe258fe6d29 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7fe2634ca118, aTask=...) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1922
#18 0x00007fe258fe7070 in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7fe23e376890) at /home/emilio/src/moz/gecko/ipc/glue/MessageChannel.cpp:1953
#19 0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run() (this=0x7fe231779e40) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#20 0x00007fe2588f6180 in mozilla::RunnableTask::Run() (this=0x7fe23003dd00) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#21 0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#22 0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (this=0x7fe2438b9380, aProofOfLock=...)
    at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373
#23 0x00007fe2588db893 in mozilla::TaskController::ProcessPendingMTTask(bool) (this=0x7fe2438b9380, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:170
#24 0x00007fe2588eed0d in mozilla::TaskController::InitializeInternal()::$_0::operator()() const (this=<optimized out>) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:84
#25 mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() (this=<optimized out>) at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsThreadUtils.h:577
#26 0x00007fe2588e57cf in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fe2438805f0, aMayWait=<optimized out>, aResult=0x7ffc79013cc7) at /home/emilio/src/moz/gecko/xpcom/threads/nsThread.cpp:1234
#27 0x00007fe2588e8b39 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fe2300672e0, aMayWait=false) at /home/emilio/src/moz/gecko/xpcom/threads/nsThreadUtils.cpp:513
#28 0x00007fe258fe975a in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7fe2634ab8d0, aDelegate=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/glue/MessagePump.cpp:87
#29 0x00007fe258f704e3 in MessageLoop::RunInternal() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:334
#30 0x00007fe258f7043b in MessageLoop::RunHandler() (this=0x7ffc79013f10) at /home/emilio/src/moz/gecko/ipc/chromium/src/base/message_loop.cc:327

We're setting up some kind of redirect via HttpChannelChild::RecvRedirect1Begin, so we end up in HttpBaseChannel::SetupReplacementChannel with a newChannel that is a mozilla::net::nsInputStreamChannel. The redirect process works out alright afaict (we properly go through nsAsyncRedirectVerifyHelper, call OnStopRequest, etc).

But then RedirectSink::AsyncOnChannelRedirect keeps the new channel alive (mRedirectChannel = aNewChannel), and the new nsInputStreamChannel never releases its callbacks, so we remain with a cyclic reference there. That cyclic reference is supposed to be cleaned up in OnRedirectResult but that call never arrives. I think that's the underlying bug. The last point in the content process before OnStopRequest where we do something interesting with the channel is here, with this stack:

1  0x00007fe258da57be in non-virtual thunk to mozilla::net::HttpChannelChild::OnRedirectVerifyCallback(nsresult) () at /home/emilio/src/moz/gecko/obj-debug-no-sccache/dist/include/nsIChildChannel.h:29
#2  0x00007fe2589fcc49 in mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run (this=0x7fe2300f91c0) at /home/emilio/src/moz/gecko/netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40
#3  0x00007fe2588d2073 in mozilla::SchedulerGroup::Runnable::Run (this=0x7fe2300f9200) at /home/emilio/src/moz/gecko/xpcom/threads/SchedulerGroup.cpp:146
#4  0x00007fe2588f6180 in mozilla::RunnableTask::Run (this=0x7fe2317a0900) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:244
#5  0x00007fe2588dc304 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:514
#6  0x00007fe2588db72f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal (this=0x7fe2438b9380, aProofOfLock=...) at /home/emilio/src/moz/gecko/xpcom/threads/TaskController.cpp:373

At this point I'm not familiar enough with necko to debug what is the right fix for that, I'll try to upload a trace to pernosco so that someone with more Necko experience can diagnose it, because as things are, we still leak the RedirectSink and the nsInputStreamChannel, as I understand it (we just don't leak the whole window).

For reference (if all goes well and I can send the trace just fine, this is a ryzen machine so rr support is recent), the leaked Preloaders (without my patch) are:

PreloaderBase::NotifyOpen(0x7fe23002bc00, aChannel = 0x7fe231741080, aIsPreload = 0, sink = 0x7fe230067150, uri = https://t.channeladvisor.com/v2/12021789.js)
PreloaderBase::NotifyOpen(0x7fe23002ec00, aChannel = 0x7fe231743080, aIsPreload = 0, sink = 0x7fe2300678d0, uri = https://www.googletagmanager.com/gtag/js?id=DC-4362844)
PreloaderBase::NotifyOpen(0x7fe23002c400, aChannel = 0x7fe231741880, aIsPreload = 0, sink = 0x7fe2300672e0, uri = https://resources.xg4ken.com/js/v2/ktag.js?tid=KT-N4270-3EB)
PreloaderBase::NotifyOpen(0x7fe23002d400, aChannel = 0x7fe231742880, aIsPreload = 0, sink = 0x7fe230067420, uri = https://connect.facebook.net/en_US/fbevents.js)
PreloaderBase::NotifyOpen(0x7fe2359c0800, aChannel = 0x7fe235ddd080, aIsPreload = 0, sink = 0x7fe235dfbba0, uri = https://s2.go-mpulse.net/boomerang/UDCTN-B4SVJ-MYFJ5-V5Q9A-U84KB)
PreloaderBase::NotifyOpen(0x7fe23593c000, aChannel = 0x7fe234561880, aIsPreload = 0, sink = 0x7fe23598c240, uri = https://recs.richrelevance.com/rrserver/p13n_generated.js?a=30280c406d639577&ts=1600969520980&v=1.2.6.20180926&ssl=t&pt=%7Chome_page.core_ol_rec_1%7Chome_page.core_ol_rec_2&u=55146031833627079542704514391466723154&s=551460318336270795427045143914667231542020824&cts=https%3A%2F%2Fwww.rei.com&l=1)

For my own reference just so I don't forget the pernosco trace name and so on:

Uploading 876064021 bytes to s3://pernosco-upload/zqsTFnLOSyg.tar.zst...
upload: ../../../../../tmp/tmpelf7upji to s3://pernosco-upload/zqsTFnLOSyg.tar.zst

(In reply to Emilio Cobos Álvarez (:emilio) from comment #98)

At this point I'm not familiar enough with necko to debug what is the right fix for that, I'll try to upload a trace to pernosco so that someone with more Necko experience can diagnose it, because as things are, we still leak the RedirectSink and the nsInputStreamChannel, as I understand it (we just don't leak the whole window).

Thanks for digging into this Emilio. I'd be happy to take a look at a rr trace if you have it.
I filed bug 1667316 for it.

Flags: needinfo?(valentin.gosu)

The patch landed in nightly and beta is affected.
:emilio, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(emilio)

Comment on attachment 9177594 [details]
Bug 1658571 - Don't create a reference cycle when catching redirects of a preload. r=smaug

Beta/Release Uplift Approval Request

  • User impact if declined: Leaks, at least with some add-ons, that slow down the whole browser over time.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: comment 74
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simple patch to avoid a reference cycle that under some circumstances involving a necko bug isn't broken up.
  • String changes made/needed: none

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Leaked windows
  • User impact if declined: See above
  • Fix Landed on Version: 83
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): See above. May need a rebase for ESR but should be straight-forward.
  • String or UUID changes made by this patch: none
Flags: needinfo?(emilio)
Attachment #9177594 - Flags: approval-mozilla-release?
Attachment #9177594 - Flags: approval-mozilla-esr78?
Attachment #9177594 - Flags: approval-mozilla-beta?
Attachment #9177618 - Flags: approval-mozilla-release? approval-mozilla-esr78?
Attachment #9177618 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9177594 [details]
Bug 1658571 - Don't create a reference cycle when catching redirects of a preload. r=smaug

approved for 82.0b5

Attachment #9177594 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9177618 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Comment on attachment 9177594 [details]
Bug 1658571 - Don't create a reference cycle when catching redirects of a preload. r=smaug

Fixes a bad leak for users of the Disconnect addon and maybe others. Approved for 81.0.1.

Attachment #9177594 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9177618 - Flags: approval-mozilla-release? → approval-mozilla-release+

I have reproduced this issue in Nightly v83.0a1, Beta v82.0b4 and Release v81.0 on Windows 10, Ubuntu 20 and Mac OS 10.15.
I have verified the fix in latest Nightly and Beta v82.0b5on Windows 10, Ubuntu 20 and Mac OS 10.15.

Waiting for the release of 81.0.1 to verify there as well. NI to me.

Flags: qe-verify+ → needinfo?(daniel.bodea)

Where are my manners? Thanks everyone for *finally* reacting! (Maybe it wouldn't hurt to set up unit-tests for different kinds of add-ons to see how FF behaves before publishing. It seems like some chain of command could've solved this crisis many weeks ago when it was first reported.)

Added to the Firefox 81.0.1 relnotes:

Fixed high memory growth with addons such as Disconnect installed, causing browser responsiveness issues over time

Thanks all for getting this fixed and shipped at the earliest version possible.

This fix has been verified on Windows 10, Ubuntu 20 and Mac OS 10.15 in Release v81.0.1 as well. Thanks.

Status: RESOLVED → VERIFIED
Flags: needinfo?(daniel.bodea)
OS: Unspecified → All
Hardware: Unspecified → Desktop

This doesn't graft cleanly to esr78, there are conflicts in particular with bug 1653011.

Flags: needinfo?(emilio)

Simpler one-off. Simon, can you sanity-check this?

Flags: needinfo?(emilio) → needinfo?(sgiesecke)
Attachment #9179267 - Flags: feedback?(sgiesecke)
Attachment #9179267 - Attachment description: Alternative part 2 → Alternative part 2 for ESR78
Comment on attachment 9179267 [details] [diff] [review]
Alternative part 2 for ESR78

Review of attachment 9179267 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. nit: I think `ForgetRef` would be a more consistent name than `TakeRef`, but given this will probably only be used at this single place, it doesn't matter much.
Attachment #9179267 - Flags: feedback?(sgiesecke) → feedback+

Yeah, it's a one-off so not gonna bother much :)

Flags: needinfo?(sgiesecke)

Comment on attachment 9177618 [details]
Bug 1658571 - Add MainThreadWeakPtr and use it in PreloaderBase::RedirectSink. r=sg!,smaug!

Approved for ESR

Attachment #9177618 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78+

Comment on attachment 9177594 [details]
Bug 1658571 - Don't create a reference cycle when catching redirects of a preload. r=smaug

We are taking the updated part 2

Attachment #9177594 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78-
Attachment #9179267 - Flags: approval-mozilla-esr78+

I cannot reproduce this issue with ESR v78.2.0esr or ESR v78.3.0esr nor on ESR v78.3.1esr on Windows 10 and Mac OS 10.15 using the steps in comment 74. This being said, I cannot properly verify the fix on ESR, as I did on the other channels.

Should I use a different method?

Flags: needinfo?(emilio)

hmm, not sure off-hand. There are other steps further up in the bug but they were not so reliable for me...

Flags: needinfo?(emilio)

Also removing the [qa-triaged] tag. Please put the qe-verify+ tag back if further investigation is necessary.

QA Whiteboard: [qa-triaged]
You need to log in before you can comment on or make changes to this bug.