Closed Bug 1729890 Opened 3 years ago Closed 3 years ago

7.76 - 3.24% Explicit Memory / Heap Unclassified + 14 more (Linux, OSX, Windows) regression on Tue September 7 2021

Categories

(Core :: JavaScript: GC, defect)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- unaffected
firefox93 --- unaffected
firefox94 + fixed

People

(Reporter: bacasandrei, Assigned: pbone)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a awsy performance regression from push 891ac85a6625cff469546795bcde4aca9db2a4c1. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
8% Explicit Memory windows10-64-2004-shippable-qr fission tp6 630,595,157.43 -> 679,549,100.43
8% Explicit Memory windows10-64-2004-shippable-qr fission tp6 636,652,624.65 -> 685,425,600.17
7% Explicit Memory macosx1015-64-shippable-qr fission tp6 658,811,461.16 -> 707,959,499.11
7% Explicit Memory windows10-64-2004-shippable-qr tp6 673,222,929.13 -> 722,181,997.29
7% Resident Memory windows10-64-2004-shippable-qr fission tp6 848,151,762.56 -> 904,550,369.29
7% Explicit Memory linux1804-64-shippable-qr fission tp6 786,161,982.72 -> 837,966,420.08
7% Resident Memory windows10-64-2004-shippable-qr fission tp6 848,594,952.29 -> 904,032,997.36
6% Resident Memory windows10-64-2004-shippable-qr fission tp6 843,578,847.72 -> 897,380,588.82
6% Resident Memory windows10-64-2004-shippable-qr tp6 904,591,744.59 -> 957,633,997.44
6% Heap Unclassified windows10-64-2004-shippable-qr tp6 78,555,694.94 -> 83,143,584.12
... ... ... ... ...
5% Heap Unclassified windows10-64-2004-shippable-qr fission tp6 88,242,519.16 -> 92,672,239.71
5% Resident Memory linux1804-64-shippable-qr fission tp6 1,178,895,503.69 -> 1,236,880,424.08
5% Heap Unclassified windows10-64-2004-shippable-qr fission tp6 87,829,516.62 -> 91,935,546.92
3% Heap Unclassified macosx1015-64-shippable-qr fission tp6 133,384,800.77 -> 137,782,242.96
3% Heap Unclassified linux1804-64-shippable-qr fission tp6 232,138,557.34 -> 239,649,401.17

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(pbone)

Set release status flags based on info from the regressing bug 1728273

== Change summary for alert #31208 (as of Wed, 08 Sep 2021 13:36:46 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
8% facebook-cristiano dcf android-hw-p2-8-0-android-aarch64-shippable-qr warm webrender 283.44 -> 306.12

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
3% twitter LastVisualChange linux1804-64-shippable-qr cold fission webrender 2,945.00 -> 2,856.67

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31208

Depends on: 1729057

It's really strange that making more GCs full GCs would increase memory usage. Just wondering if you have an idea of what's going on here Paul?

A few notes as we investigated:

  • The awsy subtest here is the "after tabs open" test rather than when GC is forced.
  • For the measurement after the GC, there is no change in numbers.
  • This is an increase in the explicit heap, but there is no change in JS heap.
  • So it seems that we have a transient increase in memory (which still could be considered a defect)
  • We wonder if the full-GC is throwing off the CC heuristics and we are somehow keeping old windows around longer than before?

If you look under artifacts, you can download the memory reports for After Tabs Open before and after the change and then use about:memory in your browser to diff them to see what the difference is.

That is an oddly large amount of heap-unclassified. I wonder what we could be missing in terms of a reporter. There's some way to enable DMD with AWSY but I don't know how well it works. There's also an increase of 26MB of JS memory in the main process, 18MB increase with Gmail (in JS and some other stuff). Similarly for live.com.

Something definitely has gone wrong here. We're not cleaning up nearly as much stuff as normal for whatever reason. The forced GC passes true to the minimize memory usage argument to dumpMemoryReportsToNamedFile().

./mach awsy-test --tp6 --dmd --headless actually worked reasonably well. Something asserts at the end of the script, but I can still run dmd.py on the files in %OBJDIR%/_tests/awsy/results... and get what I need.

I see about 4000 HttpChannelParent instances taking in the TabsOpen checkpoint that are gone when the GC is forced. These seem to account for the bulk of the extra heap unclassified that we are seeing (10s of kB per channel adds up really fast when there are 4000 of them).

That's strange. I'm not sure what keeps those alive.

(In reply to Jon Coppeard (:jonco) from comment #3)

It's really strange that making more GCs full GCs would increase memory usage. Just wondering if you have an idea of what's going on here Paul?

Thanks for lending a 2nd (& 3rd Ted!) set of eyes.

I was looking at the memory reports for and I see the difference appears to be in the bin-unused, which is the fragmentation of the jemalloc heap. But for any cause I could think of I expected to see a corresponding negative amount for some memory being freed. So I appriciate the extra eyes here.

I started with the Tabs Closed Settled memory reports and saw only this fragmentation. I was unable to get perfherder to show me the subtests due to Bug 1729057, which might have helped direct my search. So my best guess is like Ted said:

We wonder if the full-GC is throwing off the CC heuristics and we are somehow keeping old windows around longer than before?

I want to try messing with the patch to see what extra clues I can get. But I also want to try DMD like Ted did, that may be a more solid lead here.

Thanks everyone.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)

Another weirdness (not that we need more) in the pushes I did: although the heap unclassified numbers are fairly straightforward, saying that we're using a bunch of extra memory after tabs open up until it gets cleared out, the RSS numbers show the extra usage persisting after tabs close. Even in one case where it was temporarily better until tabs closed + 30s or + 30s+GC (at which point it was worse.)

It's almost like there was additional stuff to clean up during tab closing that loaded more stuff into memory? But RSS is always a weird measure, so maybe it's better not to speculate too much about it.

Note: I was running into bug 1729057 as well, but somehow the path I went through for the above links works. I clicked on "Compare with another revision" in treeherder, then filled in my try revs and selected awsy.

I will also note that unfortunately --gecko-profile-features does not appear to be implemented for awsy jobs. I wanted to get a marker-only profile out of it, but I didn't get a profile at all.

Here are some marker-only profiles
With Bug 1728273 Patch: https://share.firefox.dev/3Ad8hXg
Reverted: https://share.firefox.dev/39g2Smx

One thing that stands out is now few GCMajor happen in the parent process when the patch is there. There are none until the tabs start being closed. When I revert things, there are many GCs as the test case runs.

[Tracking Requested - why for this release]: very large regression on this memory test

(In reply to Steve Fink [:sfink] [:s:] from comment #12)

Another weirdness (not that we need more) in the pushes I did: although the heap unclassified numbers are fairly straightforward, saying that we're using a bunch of extra memory after tabs open up until it gets cleared out, the RSS numbers show the extra usage persisting after tabs close. Even in one case where it was temporarily better until tabs closed + 30s or + 30s+GC (at which point it was worse.)

It's almost like there was additional stuff to clean up during tab closing that loaded more stuff into memory? But RSS is always a weird measure, so maybe it's better not to speculate too much about it.

I noticed that after tabs closed there was extra fragmentation in jemalloc's heap, probably due to stuff not being cleaned up earlier. Let's hope that goes away when we fix the main problem.

I have requested a backout of Bug 1728273 which should fix this, but I'd rather check back and make sure the graph goes back down. Then I can look at other things in the short term and come back to this later.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.