Created attachment 601436 [details]
areweslimyet.com (AWSY) is a site that tracks Firefox's memory usage on a workload that opens lots of tabs and websites. It's currently password-protected, please contact me if you want the password.
On February 19 there was a huge regression. I've attached a screenshot from AWSY. The MaxMemoryResident jumped from ~580MiB to ~960MiB, here are the about:memory details for before and after:
531.75 MiB explicit
206.06MiB (38.75%) images
110.66MiB (20.81%) heap-unclassified
88.25MiB (16.60%) js
59.39MiB (11.17%) dom+style
55.24MiB (10.39%) layout
5.40MiB (1.02%) storage
3.10MiB (0.58%) atom-table
1,567.64KiB (0.29%) history-links-hashtable
1,242.05KiB (0.23%) xpti-working-set
716.24KiB (0.13%) startup-cache
219.86KiB (0.04%) cycle-collector
0B (0%) spell-check
911.23 MiB explicit
446.90 MiB (49.04%) images
228.37 MiB (25.06%) heap-unclassified
104.52 MiB (11.47%) js
59.44 MiB (6.52%) dom+style
55.28 MiB (6.07%) layout
6.04 MiB (0.66%) atom-table
5.40 MiB (0.59%) storage
3.05 MiB (0.33%) history-links-hashtable
1,242.05 KiB(0.13%) xpti-working-set
716.23 KiB(0.08%) startup-cache
323.66 KiB (0.03%) cycle-collector
0B (0%) spell-check
So the increase is mostly due to images (both source and decoded image data) and heap-unclassified, and a bit of JS. The MaxMemoryResidentForceGC, EndMemoryResident, and MaxMemoryResidentSettled also went up, but by smaller amounts. Also, MaxMemoryResidentSettled has been very noisy since then, including one large spike.
The regression range is hg.mozilla.org/mozilla-central/pushloghtml?fromchange=550779e6bab4&tochange=4d47329bb02e. I've CC'd everyone with a patch in this range. Incremental GC is easily the most notable thing in the list, though I don't know why it would affect images so much.
I've assigned this to John Schoenick, the author of AWSY. AWSY has support for triggering measurements of specific changesets, John will use that to identify the changeset that caused the regression. Once he does that, the bug can be reassigned to whoever was responsible.
My 2 changesets were test-only: I don't believe they can be related at all.
This was definitely caused by incremental GC. I can reproduce it.
I guess the first question is: if incremental GC is disabled does the regression go away?
(In reply to Nicholas Nethercote [:njn] from comment #3)
> I guess the first question is: if incremental GC is disabled does the
> regression go away?
Yes, setting the preference to false makes the regression go away.
I started by looking at the difference in the TabsOpenForceGC measurements. I was hoping to see an obvious leak, but I didn't find anything. Almost all the increased memory usage seems to be due to increased fragmentation, both in the GC heap and the malloc heap. This is likely to be caused by differences in how the GC and CC are run during the main part of the benchmark, when it's actually opening tabs. So I guess I'll look at that next.
Sorry about the flags. I'd say Bugzilla is in the process of becoming self-aware, but that would be giving it way too much credit.
The problem here seems to be that we're not running the cycle collector at all during the page loads or the settle period. I'll look into the heuristics and try to figure out why that's happening.
I'm going to mark this fixed because incremental is disabled now. Also, bug 730853 seemed to eliminate most of the regression. There still seems to be an issue where the Settle usage doesn't match the ForceGC usage, but we should file a separate bug for that once incremental is enabled.
Removing tracking, the work in bug 730853 has the status-fixed set, nothing to land here so nothing to track.