<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

https://glam.telemetry.mozilla.org/firefox/probe/gc_is_compartmental/explore?activeBuckets=%5B%22yes%22%5D&ref=20210907214756

Assignee

Updated

•

3 years ago

Depends on: 1729057

Jon Coppeard (:jonco)

Comment 3

•

3 years ago

It's really strange that making more GCs full GCs would increase memory usage. Just wondering if you have an idea of what's going on here Paul?

Jon Coppeard (:jonco)

Comment 4

•

3 years ago

Telemetry indicates that the patch in bug 1728273 is working:

Comment 5

•

3 years ago

A few notes as we investigated:

The awsy subtest here is the "after tabs open" test rather than when GC is forced.
For the measurement after the GC, there is no change in numbers.
This is an increase in the explicit heap, but there is no change in JS heap.
So it seems that we have a transient increase in memory (which still could be considered a defect)
We wonder if the full-GC is throwing off the CC heuristics and we are somehow keeping old windows around longer than before?

Comment 6

•

3 years ago

If you look under artifacts, you can download the memory reports for After Tabs Open before and after the change and then use about:memory in your browser to diff them to see what the difference is.

Comment 7

•

3 years ago

Following advice in Comment 6.
Memory Report Before: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/K_3uQ2mETSSCbG-laC9O0g/runs/0/artifacts/public/test_info/memory-report-TabsOpen-0.json.gz
Memory Report After: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dMvX_dTIT8O0zVwzA7QUTg/runs/0/artifacts/public/test_info/memory-report-TabsOpen-0.json.gz

Parent process adds 150 MB of heap unclassified (which is a subset of "explicit"). That seems to be the spike, but I'm not sure what is in it.

Comment 8

•

3 years ago

That is an oddly large amount of heap-unclassified. I wonder what we could be missing in terms of a reporter. There's some way to enable DMD with AWSY but I don't know how well it works. There's also an increase of 26MB of JS memory in the main process, 18MB increase with Gmail (in JS and some other stuff). Similarly for live.com.

Something definitely has gone wrong here. We're not cleaning up nearly as much stuff as normal for whatever reason. The forced GC passes true to the minimize memory usage argument to dumpMemoryReportsToNamedFile().

Comment 9

•

3 years ago

./mach awsy-test --tp6 --dmd --headless actually worked reasonably well. Something asserts at the end of the script, but I can still run dmd.py on the files in %OBJDIR%/_tests/awsy/results... and get what I need.

I see about 4000 HttpChannelParent instances taking in the TabsOpen checkpoint that are gone when the GC is forced. These seem to account for the bulk of the extra heap unclassified that we are seeing (10s of kB per channel adds up really fast when there are 4000 of them).

Comment 10

•

3 years ago

That's strange. I'm not sure what keeps those alive.

Steve Fink [:sfink] [:s:]

Assignee

Comment 11

•

3 years ago

•

Edited

(In reply to Jon Coppeard (:jonco) from comment #3)

It's really strange that making more GCs full GCs would increase memory usage. Just wondering if you have an idea of what's going on here Paul?

Thanks for lending a 2nd (& 3rd Ted!) set of eyes.

I was looking at the memory reports for and I see the difference appears to be in the bin-unused, which is the fragmentation of the jemalloc heap. But for any cause I could think of I expected to see a corresponding negative amount for some memory being freed. So I appriciate the extra eyes here.

I started with the Tabs Closed Settled memory reports and saw only this fragmentation. I was unable to get perfherder to show me the subtests due to Bug 1729057, which might have helped direct my search. So my best guess is like Ted said:

We wonder if the full-GC is throwing off the CC heuristics and we are somehow keeping old windows around longer than before?

I want to try messing with the patch to see what extra clues I can get. But I also want to try DMD like Ted did, that may be a more solid lead here.

Thanks everyone.

Assignee: nobody → pbone

Status: NEW → ASSIGNED

Flags: needinfo?(pbone)

Comment 12

•

3 years ago

Another weirdness (not that we need more) in the pushes I did: although the heap unclassified numbers are fairly straightforward, saying that we're using a bunch of extra memory after tabs open up until it gets cleared out, the RSS numbers show the extra usage persisting after tabs close. Even in one case where it was temporarily better until tabs closed + 30s or + 30s+GC (at which point it was worse.)

It's almost like there was additional stuff to clean up during tab closing that loaded more stuff into memory? But RSS is always a weird measure, so maybe it's better not to speculate too much about it.

Steve Fink [:sfink] [:s:]

Comment 13

•

3 years ago

•

Edited

Note: I was running into bug 1729057 as well, but somehow the path I went through for the above links works. I clicked on "Compare with another revision" in treeherder, then filled in my try revs and selected awsy.

I will also note that unfortunately --gecko-profile-features does not appear to be implemented for awsy jobs. I wanted to get a marker-only profile out of it, but I didn't get a profile at all.

Comment 14

•

3 years ago

Here are some marker-only profiles
With Bug 1728273 Patch: https://share.firefox.dev/3Ad8hXg
Reverted: https://share.firefox.dev/39g2Smx

One thing that stands out is now few GCMajor happen in the parent process when the patch is there. There are none until the tabs start being closed. When I revert things, there are many GCs as the test case runs.

Ryan VanderMeulen [:RyanVM]

Comment 15

•

3 years ago

[Tracking Requested - why for this release]: very large regression on this memory test

tracking-firefox94: --- → ?

Updated

•

3 years ago

tracking-firefox94: ? → +

Assignee

Comment 16

•

3 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #12)

Another weirdness (not that we need more) in the pushes I did: although the heap unclassified numbers are fairly straightforward, saying that we're using a bunch of extra memory after tabs open up until it gets cleared out, the RSS numbers show the extra usage persisting after tabs close. Even in one case where it was temporarily better until tabs closed + 30s or + 30s+GC (at which point it was worse.)

It's almost like there was additional stuff to clean up during tab closing that loaded more stuff into memory? But RSS is always a weird measure, so maybe it's better not to speculate too much about it.

I noticed that after tabs closed there was extra fragmentation in jemalloc's heap, probably due to stuff not being cleaned up earlier. Let's hope that goes away when we fix the main problem.

Assignee

Comment 17

•

3 years ago

I have requested a backout of Bug 1728273 which should fix this, but I'd rather check back and make sure the graph goes back down. Then I can look at other things in the short term and come back to this later.

Assignee

Comment 18

•

3 years ago

Looks like this got better. https://treeherder.mozilla.org/perfherder/alerts?id=31372

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED