1728273 - USER_INACTIVE GCs do not collect all zones

Reporter

Description

•

4 years ago

I don't know if this was a side effect of some of my refactoring. But I noticed via the profiler that USER_INACTIVE GCs do not collect all zones. It seems like they should but maybe they don't for some other reason.

Paul Bone [:pbone]

Reporter

Comment 1

•

4 years ago

jonco, do you know of any reason why USER_INACTIVE GCs may don't currently collect all zones?

Flags: needinfo?(jcoppeard)

Paul Bone [:pbone]

Reporter

Comment 2

•

4 years ago

Attached file Bug 1728273 - All shrinking GCs are full GCs r=jonco,smaug — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → pbone

Status: NEW → ASSIGNED

Paul Bone [:pbone]

Reporter

Comment 3

•

4 years ago

jonco answered via phabricator that these should be full GCs.

Flags: needinfo?(jcoppeard)

Pulsebot

Comment 4

•

4 years ago

Pushed by pbone@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/891ac85a6625 All shrinking GCs are full GCs r=jonco,smaug

Cristian Tuns

Comment 5

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/891ac85a6625

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox94: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 94 Branch

Acasandrei Beatrice (needinfo me)

Updated

•

4 years ago

Regressions: 1729890

Cristian Tuns

Comment 6

•

4 years ago

•

Edited

Backed out for causing Bug 1729890

Backout link

Status: RESOLVED → REOPENED

status-firefox94: fixed → ---

Resolution: FIXED → ---

Target Milestone: 94 Branch → ---

Andreea Pavel [:apavel]

Updated

•

4 years ago

Flags: needinfo?(pbone)

Cristian Tuns

Comment 7

•

4 years ago

Backout merged to central https://hg.mozilla.org/mozilla-central/rev/e6a5efd2b988

Steve Fink [:sfink] [:s:]

Assignee

Comment 8

•

4 years ago

pbone: When looking into this, I had one question that isn't really relevant to the real problem (of increased memory usage), but it seems relevant here: why does this making all USER_INACTIVE GCs full GCs instead of just all shrinking GCs?

For example, we have a GCRunner fire with reason=USER_INACTIVE, but we discover IsRegularRateTimerTIcking() again (I guess the user did something?). If mWantAtLeastRegularGC is set, then we might end up doing a non-shrinking USER_INACTIVE GC. It looks like this would happen if we originally wanted a GC for a different reason. That shouldn't be forced to be a full GC, should it?

In short, shouldn't the updated test be for aShrinking == ShrinkingGC rather than aReason == JS::GCReason::USER_INACTIVE?

I'm still trying to remind myself of how all this logic works, and I see no reason for this to cause the observed problem, but this looked a little weird so I thought I'd ask.

Paul Bone [:pbone]

Reporter

Comment 9

•

4 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

pbone: When looking into this, I had one question that isn't really relevant to the real problem (of increased memory usage), but it seems relevant here: why does this making all USER_INACTIVE GCs full GCs instead of just all shrinking GCs?

For example, we have a GCRunner fire with reason=USER_INACTIVE, but we discover IsRegularRateTimerTIcking() again (I guess the user did something?). If mWantAtLeastRegularGC is set, then we might end up doing a non-shrinking USER_INACTIVE GC. It looks like this would happen if we originally wanted a GC for a different reason. That shouldn't be forced to be a full GC, should it?

Oh, I think you're right, I forgot these would still be USER_INACTIVE.

In short, shouldn't the updated test be for aShrinking == ShrinkingGC rather than aReason == JS::GCReason::USER_INACTIVE?

Or maybe both? But I don't think you can have a shrinking GC without USER_INACTIVE at the moment.

I'm still trying to remind myself of how all this logic works, and I see no reason for this to cause the observed problem, but this looked a little weird so I thought I'd ask.

Thanks, since I've been working on this lately and trying to clean up/document the logic, I don't mind an extra set of eyes to make sure I've got it right. Or even to ask questions when something's not completely clear.

Cheers.

Acasandrei Beatrice (needinfo me)

Comment 10

•

4 years ago

(In reply to Cristian Tuns from comment #7)

Backout merged to central https://hg.mozilla.org/mozilla-central/rev/e6a5efd2b988

== Change summary for alert #31372 (as of Sun, 19 Sep 2021 19:46:16 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
8%	Explicit Memory	linux1804-64-shippable-qr	fission tp6	838,953,235.69 -> 772,642,542.00
8%	Explicit Memory	macosx1015-64-shippable-qr	fission tp6	717,904,196.22 -> 663,977,456.01
7%	Explicit Memory	windows10-64-2004-shippable-qr	fission tp6	682,592,078.67 -> 631,929,184.85
7%	Explicit Memory	macosx1015-64-shippable-qr	tp6	750,795,029.97 -> 701,188,146.86
6%	Heap Unclassified	windows10-64-2004-shippable-qr	tp6	81,840,163.62 -> 77,062,944.45
...	...	...	...	...
3%	Heap Unclassified	macosx1015-64-shippable-qr	fission tp6	140,228,940.02 -> 135,557,908.64

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31372

Steve Fink [:sfink] [:s:]

Assignee

Comment 11

•

4 years ago

I'm going to continue looking into this. In particular, I looked at the before/after profiles, and not only does the after (with patch) profile show too few GCs, but the before profile is showing more GCs than seems necessary. But I might be wrong, it depends on why they're running. They appear to be running because we CC'd, so I'm working on a patch to track and report the various CC reasons to try to get a better picture of what's happening.

(Cancelling the needinfo? pbone because it was for the backout that is now complete.)

Flags: needinfo?(pbone)

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

4 years ago

Blocks: GC.performance

Nicolas B. Pierron [:nbp] {pto}

Comment 12

•

4 years ago

Changing severity to S4 because of this is not blocking anybody from using Firefox.

Severity: -- → S4

Priority: -- → P1

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 13

•

4 years ago

(In reply to Cristian Tuns from comment #6)

Backed out for causing Bug 1729890

Backout link

== Change summary for alert #31374 (as of Sun, 19 Sep 2021 20:27:04 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
30%	facebook ContentfulSpeedIndex	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	899.83 -> 628.25
12%	instagram LastVisualChange	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	506.00 -> 444.67
12%	instagram fnbpaint	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	465.25 -> 411.75
11%	instagram fcp	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	458.29 -> 406.58
11%	instagram loadtime	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	454.71 -> 404.54
...	...	...	...	...
2%	amazon-search loadtime	android-hw-g5-7-0-arm7-shippable-qr	warm webrender	2,024.67 -> 1,982.00

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31374

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

4 years ago

Keywords: perf-alert

Paul Bone [:pbone]

Reporter

Comment 14

•

4 years ago

sfink and I swapped notes about this, we basically had the same conclusion, when the USER_INACTIVE GC was a zonal GC the scheduler would always follow it with a delayed full GC. Then that would trigger a CC, which would trigger a delayed CC_WAITING GC, which was zonal and would trigger a full GC. This cycle kept memory low but was perhaps a bit too aggressive. When the patch here made the USER_INACTIVE GC a full GC it would not setup the CC (or the CC wouldn't schedule the CC_WAITING GC) and the cycle didn't get established, so memory usage increased.

sfink and I also decided that he should take this bug and I'll focus on some other regressions.

Assignee: pbone → sphink

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

4 years ago

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:sfink, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(sphink)

Flags: needinfo?(jcoppeard)

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

3 years ago

Flags: needinfo?(jcoppeard)

Steve Fink [:sfink] [:s:]

Assignee

Comment 16

•

3 years ago

Note to self: the CC_WAITING reason that pbone referred to in his excellent comment 14 summary is the one that I renamed to CC_FINISHED.