USER_INACTIVE GCs do not collect all zones
Categories
(Core :: JavaScript: GC, defect, P1)
Tracking
()
People
(Reporter: pbone, Assigned: sfink, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(1 file)
I don't know if this was a side effect of some of my refactoring. But I noticed via the profiler that USER_INACTIVE GCs do not collect all zones. It seems like they should but maybe they don't for some other reason.
Reporter | ||
Comment 1•3 years ago
|
||
jonco, do you know of any reason why USER_INACTIVE GCs may don't currently collect all zones?
Reporter | ||
Comment 2•3 years ago
|
||
Updated•3 years ago
|
Reporter | ||
Comment 3•3 years ago
|
||
jonco answered via phabricator that these should be full GCs.
Comment 5•3 years ago
|
||
bugherder |
Comment 6•3 years ago
•
|
||
Backed out for causing Bug 1729890
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Backout merged to central https://hg.mozilla.org/mozilla-central/rev/e6a5efd2b988
Assignee | ||
Comment 8•3 years ago
|
||
pbone: When looking into this, I had one question that isn't really relevant to the real problem (of increased memory usage), but it seems relevant here: why does this making all USER_INACTIVE GCs full GCs instead of just all shrinking GCs?
For example, we have a GCRunner fire with reason=USER_INACTIVE
, but we discover IsRegularRateTimerTIcking()
again (I guess the user did something?). If mWantAtLeastRegularGC
is set, then we might end up doing a non-shrinking USER_INACTIVE GC. It looks like this would happen if we originally wanted a GC for a different reason. That shouldn't be forced to be a full GC, should it?
In short, shouldn't the updated test be for aShrinking == ShrinkingGC
rather than aReason == JS::GCReason::USER_INACTIVE
?
I'm still trying to remind myself of how all this logic works, and I see no reason for this to cause the observed problem, but this looked a little weird so I thought I'd ask.
Reporter | ||
Comment 9•3 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #8)
pbone: When looking into this, I had one question that isn't really relevant to the real problem (of increased memory usage), but it seems relevant here: why does this making all USER_INACTIVE GCs full GCs instead of just all shrinking GCs?
For example, we have a GCRunner fire with reason=
USER_INACTIVE
, but we discoverIsRegularRateTimerTIcking()
again (I guess the user did something?). IfmWantAtLeastRegularGC
is set, then we might end up doing a non-shrinking USER_INACTIVE GC. It looks like this would happen if we originally wanted a GC for a different reason. That shouldn't be forced to be a full GC, should it?
Oh, I think you're right, I forgot these would still be USER_INACTIVE
.
In short, shouldn't the updated test be for
aShrinking == ShrinkingGC
rather thanaReason == JS::GCReason::USER_INACTIVE
?
Or maybe both? But I don't think you can have a shrinking GC without USER_INACTIVE at the moment.
I'm still trying to remind myself of how all this logic works, and I see no reason for this to cause the observed problem, but this looked a little weird so I thought I'd ask.
Thanks, since I've been working on this lately and trying to clean up/document the logic, I don't mind an extra set of eyes to make sure I've got it right. Or even to ask questions when something's not completely clear.
Cheers.
Comment 10•3 years ago
|
||
(In reply to Cristian Tuns from comment #7)
Backout merged to central https://hg.mozilla.org/mozilla-central/rev/e6a5efd2b988
== Change summary for alert #31372 (as of Sun, 19 Sep 2021 19:46:16 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
8% | Explicit Memory | linux1804-64-shippable-qr | fission tp6 | 838,953,235.69 -> 772,642,542.00 |
8% | Explicit Memory | macosx1015-64-shippable-qr | fission tp6 | 717,904,196.22 -> 663,977,456.01 |
7% | Explicit Memory | windows10-64-2004-shippable-qr | fission tp6 | 682,592,078.67 -> 631,929,184.85 |
7% | Explicit Memory | macosx1015-64-shippable-qr | tp6 | 750,795,029.97 -> 701,188,146.86 |
6% | Heap Unclassified | windows10-64-2004-shippable-qr | tp6 | 81,840,163.62 -> 77,062,944.45 |
... | ... | ... | ... | ... |
3% | Heap Unclassified | macosx1015-64-shippable-qr | fission tp6 | 140,228,940.02 -> 135,557,908.64 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31372
Assignee | ||
Comment 11•3 years ago
|
||
I'm going to continue looking into this. In particular, I looked at the before/after profiles, and not only does the after (with patch) profile show too few GCs, but the before profile is showing more GCs than seems necessary. But I might be wrong, it depends on why they're running. They appear to be running because we CC'd, so I'm working on a patch to track and report the various CC reasons to try to get a better picture of what's happening.
(Cancelling the needinfo? pbone because it was for the backout that is now complete.)
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Changing severity to S4 because of this is not blocking anybody from using Firefox.
Comment 13•3 years ago
|
||
(In reply to Cristian Tuns from comment #6)
Backed out for causing Bug 1729890
== Change summary for alert #31374 (as of Sun, 19 Sep 2021 20:27:04 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
30% | facebook ContentfulSpeedIndex | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 899.83 -> 628.25 |
12% | instagram LastVisualChange | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 506.00 -> 444.67 |
12% | instagram fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 465.25 -> 411.75 |
11% | instagram fcp | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 458.29 -> 406.58 |
11% | instagram loadtime | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 454.71 -> 404.54 |
... | ... | ... | ... | ... |
2% | amazon-search loadtime | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 2,024.67 -> 1,982.00 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31374
Updated•3 years ago
|
Reporter | ||
Comment 14•3 years ago
|
||
sfink and I swapped notes about this, we basically had the same conclusion, when the USER_INACTIVE GC was a zonal GC the scheduler would always follow it with a delayed full GC. Then that would trigger a CC, which would trigger a delayed CC_WAITING GC, which was zonal and would trigger a full GC. This cycle kept memory low but was perhaps a bit too aggressive. When the patch here made the USER_INACTIVE GC a full GC it would not setup the CC (or the CC wouldn't schedule the CC_WAITING GC) and the cycle didn't get established, so memory usage increased.
sfink and I also decided that he should take this bug and I'll focus on some other regressions.
Comment 15•3 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:sfink, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
Note to self: the CC_WAITING reason that pbone referred to in his excellent comment 14 summary is the one that I renamed to CC_FINISHED.
Description
•