GC_MINOR_US telemetry changes around 21-01-08
Categories
(Core :: JavaScript: GC, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox85 | --- | unaffected |
firefox86 | --- | wontfix |
firefox87 | --- | wontfix |
firefox88 | --- | wontfix |
firefox89 | --- | fix-optional |
People
(Reporter: jonco, Assigned: jonco)
References
(Regression)
Details
(Keywords: regression)
Attachments
(7 files)
199.56 KB,
image/png
|
Details | |
143.39 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Bug 1685132 was meant to simplify the implementation of tracing while also increasing performance. Looking at telemetry shows it had mixed results:
https://glam.telemetry.mozilla.org/firefox/probe/gc_minor_us/explore?ref=20210127093943
Long minor GCs improved a lot - e.g. the 75th percentile is down to 791uS from 1330uS. However short collections regressed - e.g. the 25th percentile increased from 77uS to 114uS.
Our aim with these is to keep them bellow one millisecond, so this isn't quite as bad as it looks from the attached graph (note the log scale emphases changes at the low end).
My theory is that edges traced via TraceEdge slowed down (maybe due to making this a virtual dispatch) whereas edges traced internally to the tracer became faster (maybe due to simplifying value handling). That would mean quick collections where root tracing dominates became slower, where long collections where evacuation dominates became faster, explaining the results.
Fortunately there's more we can do here to improve things.
Comment 1•4 years ago
|
||
That graph still looks like a net positive to me. The percentiles squeezed together some, reducing overall variance. But most importantly, the top percentiles dipped a bit. Which is far more important in my mind than fast collections becoming a little less fast.
Assignee | ||
Comment 2•4 years ago
|
||
The results on TMO are different for the 5th percentile. The change on the 7th January isn't nearly as pronounced (note this has a linear Y axis scale).
Updated•4 years ago
|
Comment 3•4 years ago
|
||
Set release status flags based on info from the regressing bug 1685132
Assignee | ||
Comment 4•4 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #1)
Agreed, slightly longer fast collections are unlikely to be noticeable. I'd still like to improve this though.
Assignee | ||
Comment 5•4 years ago
|
||
One thing we have to do when tracaing is udpate context information (e.g. edge name) if the tracer is a kind that requires it. It's simpler and more efficient to give all tracers this context and perform an unconditional write to the stack.
Mostly we can get away without saving/restoring context information too. This adds AutoClearTracingContext for the one place we need to do this because of nested use of the same tracer while tracing something else.
Assignee | ||
Comment 6•4 years ago
|
||
Previously this called a separate function in another file to do the virtual dispatch. This moves the DoCallback functions to Marking.cpp and marks them inline.
Depends on D103500
Assignee | ||
Comment 7•4 years ago
|
||
Currently we arrange for the low thre bits of the TraceKind value be set for all trace kinds greater than seven (see definition of JS::TraceKind). This is to save a branch creating a GCCellPtr in GCCellPtr::checkedCast.
This has the side effect of generating an 80 entry lookup table when we switch on TraceKind when there are only 12 distinct trace kinds. And clang doesn't use a branch in checkCast anyway.
The patch changes TraceKind so that the values increment by one each time. This affects performance by increasing inlining opportunities in the marking code.
Depends on D103501
Assignee | ||
Comment 8•4 years ago
|
||
Not related to the rest of this bug, I just happened to find it while looking at the generated code. It seems clang inlines a bunch of error paths which should only very rarely be taken. The patch adds hints to stop this from happening.
Depends on D103502
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5ad5e86d2358 Make updating tracing context information unconditional on the tracer kind r=sfink https://hg.mozilla.org/integration/autoland/rev/c2aced1a8fad Make sure the non-marking path is inlined in TraceEdgeInternal r=sfink https://hg.mozilla.org/integration/autoland/rev/7b413c436848 Simplify TraceKind definition to reduce code generated by switch statements r=sfink https://hg.mozilla.org/integration/autoland/rev/3cf752580f1b Don't inline error paths in marking r=sfink
Comment 11•4 years ago
|
||
bugherder |
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 12•4 years ago
|
||
Set release status flags based on info from the regressing bug 1685132
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
|
||
Well, those changes didn't help.
It's possible that bug 1685128 also had an effect here. We can try making that less aggressive and see what effect that has.
Assignee | ||
Comment 14•4 years ago
|
||
Depends on D106126
Comment 15•4 years ago
|
||
As I commented in the review, this tweak will only make the checks for an underused nursery more frequent. It will not have a direct effect on idle-time minor GCs triggered because the nursery is filling up. I think I got it a little wrong, though; despite the name, Nursery::isUnderused
doesn't consider the size of the nursery at all, other than to return false if we're already at the minimum size. timeToCheckWhetherWeCanShrinkTheNursery
would be a more accurate description.
I'm a little worried that this parameter change won't even affect the underused check much at all, though. Consider the situation, which is probably quite common, where you do a bunch of setup and interaction and then go idle. While things are active, tasks are running so the microtask queue is frequently getting drained. The last task executed is unlikely to trigger a minor GC due to the underuse check because there was probably a recent check shortly before in an earlier task. Then we go idle, stop running any tasks, and therefore stop checking for underuse entirely. So the underuse check frequency only has an effect while things are active, and never gets consulted when the system is idle. Furthermore, while things are active we're likely to be allocating stuff and so the nursery won't be underused.
If we want to shrink an underused nursery, then we probably want the behavior where after we go idle for a while, we collect the nursery (no matter how full it is), and then periodically notice that it's not being used much and shrink it. If we can't rely on tasks to force the latter checks, we depend on major GC timing to trigger minor GCs that will cause the nursery to shrink. I don't know if we do enough major GCs during idle time to get the behavior we want?
Anyway, I feel I've probably gone a little too far out on the speculation limb without looking at data, so I'll just leave it here for consideration for now.
Comment 16•4 years ago
|
||
Another thing: increasing this parameter may shift the histogram to make the 5th percentile even more expensive. It will reduce minor GCs, but preferentially those that tend to be very fast (the idle time ones that we're only doing in hopes of being able to shrink the nursery). If you just cut out the fastest collections, your 5th percentile will shift up to include the remaining more expensive ones.
Assignee | ||
Comment 17•4 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #15)
As I commented in the review, this tweak will only make the checks for an underused nursery more frequent. It will not have a direct effect on idle-time minor GCs triggered because the nursery is filling up. I think I got it a little wrong, though; despite the name,
Nursery::isUnderused
doesn't consider the size of the nursery at all, other than to return false if we're already at the minimum size.timeToCheckWhetherWeCanShrinkTheNursery
would be a more accurate description.
It should make them less frequent. I assume this is what you meant. Indeed, it doesn't affect the eager collection of nearly-full nurseries. The naming isn't great but I was struggling to come up with something better.
I'm a little worried that this parameter change won't even affect the underused check much at all, though. Consider the situation, which is probably quite common, where you do a bunch of setup and interaction and then go idle. While things are active, tasks are running so the microtask queue is frequently getting drained. The last task executed is unlikely to trigger a minor GC due to the underuse check because there was probably a recent check shortly before in an earlier task. Then we go idle, stop running any tasks, and therefore stop checking for underuse entirely. So the underuse check frequency only has an effect while things are active, and never gets consulted when the system is idle. Furthermore, while things are active we're likely to be allocating stuff and so the nursery won't be underused.
Yes this is true. So this is mainly going to come into effect when the browser is not idle but has stopped running JS, or the allocation rate has fallen and left as with a large nursery that isn't filling quickly.
If we want to shrink an underused nursery, then we probably want the behavior where after we go idle for a while, we collect the nursery (no matter how full it is), and then periodically notice that it's not being used much and shrink it. If we can't rely on tasks to force the latter checks, we depend on major GC timing to trigger minor GCs that will cause the nursery to shrink. I don't know if we do enough major GCs during idle time to get the behavior we want?
Yes, this would be desirable as well. On workers we do a shrinking GC after they have been idle for a bit. This doesn't happen for the main thread AFAIK.
Another thing: increasing this parameter may shift the histogram to make the 5th percentile even more expensive. It will reduce minor GCs, but preferentially those that tend to be very fast (the idle time ones that we're only doing in hopes of being able to shrink the nursery). If you just cut out the fastest collections, your 5th percentile will shift up to include the remaining more expensive ones.
Yes, that is a possibility. I'd still like to try it and see what happens.
Comment 18•4 years ago
|
||
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/25225e079907 Increase idle time nursery collection timeout to make it less agressive r=sfink
Comment 19•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
I'm not planning on doing anything more in this bug.
Updated•3 years ago
|
Updated•3 years ago
|
Description
•