Open Bug 1900470 Opened 6 months ago Updated 23 days ago

Cycle collection times get worse over time when running the sp3 test "NewsSite-Nuxt" with many iterations

Categories

(Core :: Cycle Collector, defect)

defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

Details

Profile: https://share.firefox.dev/3wPtHxq

When running https://speedometer-preview.netlify.app/?suite=NewsSite-Next&iterationCount=300 , on some machines, CC times get longer and longer over time.

The Windows machine in the Toronto office can easily achieve two-second long CCSlice times within the first 100 iterations.

My macOS machine needed 188 iterations to get to 980ms of CCslice.

When I switch tabs while running the test, the test is suspended; if I leave the speedometer tab in the background for 10 seconds or so and switch back to it, subsequent iterations do not show long CC times... sometimes.
Here's a profile where I switch away three times; after the third switch, the CCSlice times remained below 200ms for the remaining 100 iterations: https://share.firefox.dev/4c4OHii

I worry that this behavior means we're not cleaning up enough things for the speedometer iframes when we navigate to a different subtest, and that we accumulate more garbage than other browsers during the test.

Component: DOM: Core & HTML → Cycle Collector

A potentially relevant thing I noticed recently is that all of our leak tests using checkForEventListenerLeaks have intermittent failures (where things aren't collected). Because those tests all are checking for the leak of a specific item, it might be easier to dig into in terms of smaller cycle collection graphs.

See also bug 1842074 for previous investigations into the Firefox heap growing during Speedometer3, though I don't know if we ever figured anything out.

The cycle collector might not be running enough. The scheduling isn't set up to deal very well with situations like this where we load a zillion pages very quickly. In the profile, there's 50 seconds between the end of one CC and the start of the next. Looking at the CYCLE_COLLECTOR_TIME_BETWEEN telemetry value in my own normal browsing session, the most common value is 6 seconds. So we're taking almost 10 times longer than normal to run a CC, when presumably there's a lot more to collect.

That being said, during the large period where we appear to spend almost 0 time on the CC, under the marker chart I see 6 major GCs where the GC reason is CC_FINISHED. Then we have a major GC where the GC reason is FULL_GC_TIMER, and immediately after that we run our giant CC. So maybe we're doing some kind of compartmental GCs that aren't marking enough stuff grey so the CC runs only extremely briefly and fails to collect things, rather than the CC being shut out from running. I guess I'll have to see how to dig into the tiny CC markers to see if there's information there. But this does sound kind of like the problem we were seeing in bug 1842074.

Another possible issue is that the test harness might be holding things alive. We've had issues in the past where one of our Mochitest test suites was just holding alive every window the entire time. This seems less likely to be the immediate cause of the problem you are describing, given that the CC isn't really running at all for almost a minute, but it is sometime to keep in mind.

See Also: → 1842074

Yeah, so looking at the profile, we are running the CC a bunch, but it done very fast and isn't collecting much.

Summary: Cycle collection times get worse over time when runinng the sp3 test "NewsSite-Nuxt" with many iterations → Cycle collection times get worse over time when running the sp3 test "NewsSite-Nuxt" with many iterations
Severity: -- → S3
See Also: → 1925335
See Also: → 1928791
You need to log in before you can comment on or make changes to this bug.