Open Bug 1329900 Opened 7 years ago Updated 2 years ago

GC takes a long time on the test case in bug 1269695

Categories

(Core :: JavaScript: GC, defect, P2)

defect

Tracking

()

Performance Impact low

People

(Reporter: ehsan.akhgari, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(4 files)

Attached image GC breakdown
While profiling bug 1269695, I'm noticing several seconds of GC.  For example, see the screenshot which shows the Instruments breakdown of a given run.

I also captured a GC log by setting MOZ_GCTIMER, which I will attach shortly.
Attached file GC log
Attachment #8825300 - Attachment mime type: text/x-log → text/plain
That log has 1975ms of GC in it.

I ran on my machine. With the default settings, I had a run with 2.0sec and 1.9sec of GC. Disabling incremental GC dropped it to 1.7sec. Changing a bunch of GC settings to ridiculously high dropped the total GC time to 0.6sec. Those settings were:

javascript.options.mem.gc_allocation_threshold_mb	1000
javascript.options.mem.gc_high_frequency_heap_growth_max	2000
javascript.options.mem.gc_high_frequency_high_limit_mb	1000
javascript.options.mem.gc_high_frequency_low_limit_mb	1500
javascript.options.mem.gc_low_frequency_heap_growth	600
javascript.options.mem.high_water_mark	512

The total memory usage of my browser after this run was about 660MB (this is non-e10s).

There are 24 GCs in Ehsan's log, only 3 of them are zone GCs. (There was only a single tab open, so I'm not sure how much it matters.) It looks like the last 4 GCs are from shutdown.
Depends on: 1329918
Jon, can you take a look at what we might be able to do here? It's a big chunk of a very very slow google document load, one where it appears that Chrome takes much less time (I saw around 0.7sec in their GC function for this.) There are instructions for reproducing in bug 1269695, but you just have to load https://docs.google.com/document/d/1EPSmGqm2r4Qq42B4t1VOYacjTlL0JVuC8JSlUvoIhss/edit and press ctrl-end to go to the end of the document.
It would be good to understand why so few zone GCs. Though, during page load I do expect that we run bunch of frame script code too.
If this is the test case I'm thinking of, one of the problems is that we do a full GC 5 seconds after we start a page load (via a PokeGC call, maybe in SetNewDocument). The page load takes more than 5 seconds, so we're GCing while we're still loading, which is not good.

As for why it is a full GC, bug 1110928 is about making this into a zone GC, but I haven't been able to land it, because it causes OOMs in testing.

It might be interesting to comment out a few of those PokeGC calls that might be triggered during loading and see how that affects things.
To explain it a little more thoroughly... in SetNewDocument (and a few other places) we call PokeGC. What this does is it starts a 5 second timer. When that timer is finished, we run a full GC. I expect the full GC part of it doesn't matter much for this test case, because people are probably running the test case without any other tabs open (in bug 1110928, we run the zone GC for the chrome zone plus the document zone).
I'm seeing a couple of things in local testing:
 - lots of slow (>10ms) minor GCs
 - non-incremental major GCs due to allocation pressure

For the former, we seem to be consistently tenuring a lot of data even though we have a 16MiB nursery.  It could be that we need to tweak our pre-tenuring heuristics.  I'll attach a patch that might help.  Steve, are you able to test this where you are?

For the latter the adjusting our allocation thresholds might help.  But it does look like this is just allocating a ton of data.
Attached patch pretenure-moreSplinter Review
Patch to pre-tenure more when collecting the nursery.
Oh, I have a feeling that the way we tenure nursery objects is not terribly efficient either.  There are a ton of conditions on the type of the object (e.g. see JSObject::allocKindForTenure and TenuringTracer::moveObjectToTenured).  It might be worth investigating whether we can improve this.
FWIW, I think the discussion about zone GCs could happen in bug 1110928.
(In reply to Andrew McCreight [:mccr8] from comment #6)
> To explain it a little more thoroughly... in SetNewDocument (and a few other
> places) we call PokeGC. What this does is it starts a 5 second timer. When
> that timer is finished, we run a full GC.

Can we do this on idle or something?  Many of the GSuite test cases I have profiled have load times well over 5s, and GCing in the middle of the load surely won't help those load times.
GCing only on idle can be problematic, leading to OOMs.
But sure, we might be able to tweak the scheduling some more
ehsan, do you have still link to the original cleopatra profile?

After bug 1110928 and following comment 3 from this bug, I'm not seeing lots of major GCs in profile, but minor GCs are there. I'll test the patch in this bug.
Hmm, Gecko profiler is still broken on local linux builds :/
Attached patch mergedSplinter Review
jonco's patch which applies to current m-i.

If profiler's summary is something to rely on (in my case when symbols aren't available), this may drop GC time in child process a tiny bit.
(In reply to Olli Pettay [:smaug] from comment #13)
> ehsan, do you have still link to the original cleopatra profile?

No.  FWIW I wasn't using Cleopatra for this bug, I was using Instruments (a native profiler on OSX).  See the screenshot in comment 0 for example.  You should be able to replicate this with the Zoom profiler (or any other native profiler.)
Whiteboard: [qf:p1] → [qf:p3]
Keywords: perf
(ni?sfink since jonco is on PTO)

Steve, this was [qf:p1] for a while and ... well, it looks like it might be low-hanging fruit, guessing from the size of the patch and the fact that there was a reproducible test case. Hence P2. Please reprioritize as appropriate!
Flags: needinfo?(sphink)
Priority: -- → P2
Note to self: consider this bug when looking at making string unpretenuring based on retention rates.

Note to others: the patch is very aggressive, making it *far* more eager to start pretenuring things and skip the nursery entirely: we originally considered pretenuring when 80% of all objects are surviving a minor GC. We've since changed that to 60%. This patch drops it to 20%. Additionally, we only pretenure an individual object class if we tenure at least 3000 of them; this drops it to 1000.

We could try out these new thresholds, but I'd probably want to do it as an A/B test. Pretenuring too much defeats the nursery and causes more and longer major GCs.

But as a smoke test, I'm pushing a talos run to try: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=015d8f5eca32d2540923797c821a5be1b98f099d
For the record, that shows that on talos the aggressive pretenuring makes almost no difference, but it speeds up one test.

...I'm not sure what to make of that.
Performance Impact: --- → P3
Whiteboard: [qf:p3]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: