Open Bug 1867074 Opened 1 year ago Updated 8 months ago

60% CPU usage of content process because of memory leak in gc-heap on Amazon product page

Categories

(Core :: Performance, defect)

defect

Tracking

()

Performance Impact low

People

(Reporter: whimboo, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf:resource-use)

Attachments

(1 file)

Attached file memory-report.json.gz

​​### Basic information
I opened this page a couple of days ago in a Private Browsing window and forgot about it. Now the whole content process (PID 67326) uses ~5GB of memory and triggers GC runs quite often and keeps the CPU of the content process roughly at 60%.

Firefox profile (general): https://share.firefox.dev/410Njcq
Firefox profile (power): https://share.firefox.dev/3Gj1m3p

5,215,371,136 B (100.0%) -- explicit
├──5,007,415,344 B (96.01%) -- window-objects
│  ├──4,941,292,960 B (94.74%) -- top(https://www.amazon.de/Milchaufsch%C3%A4umer-Milchsch%C3%A4umer-Automatischer-Milchbeh%C3%A4lter-Milchschaum/dp/B0C6PJ2GRD/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=1B09KTXF86UCD&keywords=severin%2Bmilchsch%C3%A4umer&psr=EY17&qid=1700923841&s=black-friday&sprefix=severin%2Bmilchsch%C3%A4umer%2Cblack-friday%2C80&sr=1-4&th=1, id=447)
│  │  ├──4,819,175,736 B (92.40%) -- active
│  │  │  ├──4,818,869,696 B (92.40%) -- window(https://www.amazon.de/Milchaufsch%C3%A4umer-Milchsch%C3%A4umer-Automatischer-Milchbeh%C3%A4lter-Milchschaum/dp/B0C6PJ2GRD/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=1B09KTXF86UCD&keywords=severin%2Bmilchsch%C3%A4umer&psr=EY17&qid=1700923841&s=black-friday&sprefix=severin%2Bmilchsch%C3%A4umer%2Cblack-friday%2C80&sr=1-4&th=1)
│  │  │  │  ├──4,786,039,176 B (91.77%) -- js-realm(https://www.amazon.de/Milchaufsch%C3%A4umer-Milchsch%C3%A4umer-Automatischer-Milchbeh%C3%A4lter-Milchschaum/dp/B0C6PJ2GRD/ref=sr_1_4?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=1B09KTXF86UCD&keywords=severin+milchsch%C3%A4umer&psr=EY17&qid=1700923841&s=black-friday&sprefix=severin+milchsch%C3%A4umer%2Cblack-friday%2C80&sr=1-4)
│  │  │  │  │  ├──4,781,187,344 B (91.67%) -- classes
│  │  │  │  │  │  ├──4,774,485,448 B (91.55%) -- class(Object)/objects
│  │  │  │  │  │  │  ├──4,152,761,800 B (79.63%) ── gc-heap
│  │  │  │  │  │  │  └────621,723,648 B (11.92%) -- malloc-heap
│  │  │  │  │  │  │       ├──620,808,608 B (11.90%) ── elements/normal
│  │  │  │  │  │  │       └──────915,040 B (00.02%) ── slots

Steps to Reproduce:
Probably open the page and keep it running for quite some time in the background. As of now I'm not sure if it is required to open it inside a Private Browsing window or not.

Expected Results:
There shouldn't be such a high memory consumption of the content process.

Actual Results:
About 5GB of memory is used by the content process hosting Amazon.

System configuration:

OS version: MacOS 13.6
GPU model: Apple M1 Pro
Number of cores: 10 (8 performance and 2 efficiency)
Amount of memory (RAM): 32GB

Actually I had several tabs with different products open in this Private Browsing window and 2 of them actually showed this high memory usage.

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #0)

Firefox profile (general): https://share.firefox.dev/410Njcq
Firefox profile (power): https://share.firefox.dev/3Gj1m3p

These 2 profiles contain only a few threads of the parent process, so they probably don't show what you wanted to share.

(In reply to Florian Quèze [:florian] from comment #2)

These 2 profiles contain only a few threads of the parent process, so they probably don't show what you wanted to share.

Hm, looks like publishing the profiles failed. :( I would have to wait until the memory is going up again now given that I had to refresh those tabs. I hope that the attached memory report will be at least helpful.

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #3)

(In reply to Florian Quèze [:florian] from comment #2)

These 2 profiles contain only a few threads of the parent process, so they probably don't show what you wanted to share.

Hm, looks like publishing the profiles failed. :( I would have to wait until the memory is going up again now given that I had to refresh those tabs. I hope that the attached memory report will be at least helpful.

There is probably an extra checkbox you need to check to include data from private windows in what you upload.

I've had the chance to catch this situation again. Now the content process for the Amazon product page alone uses 5GB of memory and shows roughly a 50% CPU load. This time I hope that the recorded Firefox profile contains all the information as needed: https://share.firefox.dev/3uXwpQ6. Please let me know - I'll leave the tab open for a bit in case something needs further checks.

Flags: needinfo?(florian)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #5)

I've had the chance to catch this situation again. Now the content process for the Amazon product page alone uses 5GB of memory and shows roughly a 50% CPU load. This time I hope that the recorded Firefox profile contains all the information as needed: https://share.firefox.dev/3uXwpQ6.

Here is what I see in the profile:

  • the page uses timers every 4ms!
  • there's a CSS animation playing continuously
  • I don't see an obvious memory leak in the profile, but there's a GCMajor marker taking 10.5s, with many 100ms GC slices. The JS heap size is 3.98GB, so memory use does seem excessive (it's hard to say if it happened all at once or very slowly over time).
  • your profile was captured with the "jsallocations" feature, which might cause overhead and make the timings in the profile less relevant
  • the process priority is "foreground", which is normal as you said the tab was in the foreground. Putting the tab in the background would throttle both timers and CSS animations, reducing CPU use which isn't related to the excessive memory use, and hopefully making what you are trying to show in this bug more visible.
  • outside of the time spent GC'ing, the biggest power use spikes happen when accessibility code runs. I'm not sure if it's something expensive that triggers notifications consumed by the accessibility code or if it's accessibility code itself. If you didn't expect accessibility to be enabled, this might be worth looking into.
Flags: needinfo?(florian)

As requested by Florian here a new recording of the Amazon product page but with the tab being in the background and jsallocations turned off: https://share.firefox.dev/3RIoe2T.

And here one more profile with the timer thread enabled as well: https://share.firefox.dev/47ZwVLO

But also when closing the tab the seen behavior is strange. The used memory for the given process goes up to 10GB, drops again to 5GB after a while, and the same happens again and again... Here a shorter recorded profile: https://share.firefox.dev/3RkOE9D

Jon or Paul, could one of you maybe have a look at those recorded profiles? Maybe you have an idea what's wrong?

Note that given the excessive power utilization of this process I'll have to restart Firefox now, and it might take me a couple of days to get into the same state again.

Flags: needinfo?(pbone)
Flags: needinfo?(jcoppeard)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #8)
This is probably the page leaking memory.

There is also an unfortunate situation where we run last ditch GCs very often when the heap approaches its limit. That may be making things worse here as there is one of these GCs in the profile in comment 7. There is a bug on file for this issue but I can't find it right now.

But also when closing the tab the seen behavior is strange. The used memory for the given process goes up to 10GB, drops again to 5GB after a while, and the same happens again and again...

This is likely the cycle collector running. AIUI it allocates memory linear in the size of the heap to build its graph.

Flags: needinfo?(jcoppeard)

I agree with Jon, it looks like a page leaking memory. The CC time is a lot larger than the GC time and both are running non-incrementally. If they can't reduce the memory by running non-incrementally then it's most likely a leak in the page rather than in the browser. I'm not sure if there's anything else we can do, if there is maybe mccr8 knows it.

Flags: needinfo?(pbone) → needinfo?(continuation)

Interesting. What's the best way to figure out where the leak actually is? Should I run the profiler over a longer period of time with the allocations enabled, or is there a better way? The increase of memory is slow and needs to run through a couple of days.

(In reply to Paul Bone [:pbone] from comment #10)

I agree with Jon, it looks like a page leaking memory. The CC time is a lot larger than the GC time and both are running non-incrementally. If they can't reduce the memory by running non-incrementally then it's most likely a leak in the page rather than in the browser. I'm not sure if there's anything else we can do, if there is maybe mccr8 knows it.

It's very likely that the page is slowly leaking, but from my point of view that's not our problem. The problem I see is that the amount of CPU time (and power) used by the GC/CC is several orders of magnitude higher than the resource use of the code of the page itself. In my opinion, when a page is leaking, and we already tried multiple times to free memory without achieving any significant heap size reduction, we should significantly reduce the frequency at which we trigger GC/CC, so that the memory use problem does not become a battery life / power use problem. Is there a bug covering this already?

Flags: needinfo?(pbone)
Flags: needinfo?(jcoppeard)

To extend what Florian just wrote, would there be a way if such a feature exists to eg send out a notification so that Firefox or extension / tools like my perfchaser extension could actually handle such a situation and inform the user of a tab/page that is leaking memory?

I don't think there are any great ideas. One thing I've thought about is to just give up and treat leaked windows as being alive. That would at least stop the CC using so much CPU. I think normally tab unloading would save you from this specific scenario of "page ignored for days" but I suppose being in a private browsing window stops that.

Flags: needinfo?(continuation)

Can we bring back tab unloading in private windows?

Some context:

  1. On Linux, when Firefox is updated by a package manager, new processes can't be started (bug 1705217)
  2. In this state, selecting an unloaded tab results in the content being replaced with about:restartrequired and the URL bar being cleared
  3. This is a kind of data loss, so tab unloading in private windows was disabled on all platforms in bug 1751366

Bug 1705217 is actively being addressed, and tab unloading was entirely disabled on Linux a few months later due to a different problem (bug 1780058). So to me, it seems like we could bring back tab unloading in private windows, perhaps behind a pref.

Depends on: 1873235

(In reply to Florian Quèze [:florian] from comment #12)

The problem I see is that the amount of CPU time (and power) used by the GC/CC is several orders of magnitude higher than the resource use of the code of the page itself.

This is a fair point.

In my opinion, when a page is leaking, and we already tried multiple times to free memory without achieving any significant heap size reduction, we should significantly reduce the frequency at which we trigger GC/CC, so that the memory use problem does not become a battery life / power use problem. Is there a bug covering this already?

This is a known issue but I can't find a bug for it right now. I've filed bug 1873235 for this.

Flags: needinfo?(jcoppeard)

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Resource impact: Severe

Performance Impact: --- → low
Flags: needinfo?(pbone)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: