Preparation for eager GC in mainthread event loop
Categories
(Core :: JavaScript: GC, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox103 | --- | fixed |
People
(Reporter: sfink, Assigned: sfink)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(6 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
JS_MaybeGC uses a lower threshold for triggering a GC than an allocation-triggered GC. We should consider calling this periodically from the main thread's event loop to make it more likely to do a GC when there is less overhead from scanning the stack.
JS_MaybeGC is already called every second from a worker thread. I was under the mistaken impression that we were calling this from the main thread already.
This might reduce some of our GC overhead, as well as reduce the latency from the initial root marking slice. (Or it might not, if most of our GCs are already coming from the idle task, where the JS stack is already empty.)
Comment 1•5 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #0)
This is a good idea. We probably only want to do this if there is idle time though, so this API isn't ideal.
One way to do that would be to change JS::IsIdleGCTaskNeeded / RunIdleTimeGCTask to start a major GC as well as performing a minor GC.
Assignee | ||
Comment 2•5 years ago
|
||
(In reply to Jon Coppeard (:jonco) (PTO until 4th Jan) from comment #1)
One way to do that would be to change JS::IsIdleGCTaskNeeded / RunIdleTimeGCTask to start a major GC as well as performing a minor GC.
The difficulty with just doing that is that normally when the browser starts a major GC, it also sets up the periodic timer to run the slices. The caller of RunIdleTimeGCTask
is in XPCOM, while the timer stuff is in DOM.
It also feels a little unnecessary to have two separate idle triggering mechanisms: the DOM GCTimer stuff in addition to the XPCOM (CycleCollectedJSContext
) IsIdleGCTaskNeeded
stuff (which creates its own idle runnable).
My current plan:
- Remove the existing
IsIdleGCTaskNeeded
stuff from XPCOM - In
XPCJSContext::AfterProcessTask
, where it callsMaybePokeCC
currently, add a newMaybePokeGC
MaybePokeGC
will choose whether to do an immediately nursery collectionMaybePokeGC
will also choose whether to start a new major GC (this would be the change in behavior)- if it starts a new major GC, also set up the GCTimer
I hope to do those last two by calling the existing nsJSContext::PokeGC
, but I'll need to modify it to not insist on doing a full (all zones) GC in this case.
I might also skip the nursery collection if there's a major GC started or ongoing, under the assumption that it'll figure out before each slice whether it wants to do a nursery collection anyway.
This change would mean that not all CycleCollectedJSContext
s will do these nursery and new major GCs automatically; it will only happen for xpconnect ones instead. I think that's fine, since I don't think generic CycleCollectedJSContext
s exist in practice.
I'll CC mccr8 in the off chance that he can tell me I'm being stupid before I get too far into this.
Comment 3•5 years ago
|
||
How much time are we spending doing stack scanning? How many GCs are triggered by allocation? I'm a little concerned about disrupting the current GC too much/GC not enough balance with this, which sounds like a major change in the way we start GCs.
Also, keep in mind that if you start a GC when we're in the middle of a CC we will synchronously finish the CC, which will certainly cause a large pause.
I'm not familiar with how the existing idle stuff works.
Assignee | ||
Comment 4•5 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #3)
How much time are we spending doing stack scanning?
For this new path, zero -- we'd only be triggering these from an idle task runner, with nothing on the JS stack. Or rather: this new path would reduce stack scanning, since TOO_MUCH_MALLOC
GCs that currently do some stack scanning would be replaced with idle-time preemptive GCs.
How many GCs are triggered by allocation? I'm a little concerned about disrupting the current GC too much/GC not enough balance with this, which sounds like a major change in the way we start GCs.
Looking at telemetry: 47% are CC_FINISHED
(the new name of CC_WAITING
), meaning they're idle-time. 31% are INTER_SLICE_GC
, also idle time. 9% are BG_TASK_FINISHED
, which if I'm reading the code correctly gets picked up by an interrupt, so happens in non-idle time. Next is 4% PAGE_HIDE
, and everything else is change. Oh, wait: 2.6% are TOO_MUCH_MALLOC
, which is relevant here.
This would presumably shift things quite a bit, by adding a bunch of new EAGER_ALLOC_TRIGGER
GCs during idle time. But I'm really uncertain of how many will be picked up by this; CC_FINISHED
might still get the bulk of the GCs done, and these new GCs will be locked out during the time when INTER_SLICE_GC
will be operational. So there's a good chance it won't really do much of anything to the overall picture (and given that it'd be replacing some idle time GCs with different idle time GCs, these shifts may not have any effect.)
The reason for trying this out is to shift more non-idle GCs towards idle (EAGER_ALLOC_TRIGGER
) GCs. The non-idle ones may not show up that well in telemetry, but they're more likely to cause noticeable pauses. It also shifts more of the scheduling and timing control to Gecko -- Gecko would now hopefully do an EAGER_ALLOC_TRIGGER
GC before we are in danger of being forced to GC due to allocation. So the 2.6% non-idle TOO_MUCH_MALLOC
ones would hopefully shift to an idle EAGER_ALLOC_TRIGGER
bucket.
I don't trust my ability to predict behavior, though. And this would come at a cost of slightly more GCs, since we'll start doing them a little earlier this way.
Also, keep in mind that if you start a GC when we're in the middle of a CC we will synchronously finish the CC, which will certainly cause a large pause.
With my implementation, which I'll post in a moment, I don't do these eager GCs if we're CCIng. One of the benefits of shifting control to Gecko. :-)
I'm not familiar with how the existing idle stuff works.
There's an idle mechanism that's only used for nursery GCs, where after the event loop is drained (in AfterProcessTask
) we check whether we'd like to collect the nursery, and if so, create an idle runnable that does a minor GC.
Then there's the main CC/GC scheduling stuff, that normally runs when idle and mostly decides to GC based on a CC having finished. For this, I would be removing the AfterProcessTask
and merging it into the CC/GC setup.
Assignee | ||
Comment 5•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
Assignee | ||
Comment 7•5 years ago
|
||
Oh, there's another 2.3% in ALLOC_TRIGGER
, some of which will hopefully be absorbed into the new EAGER_ALLOC_TRIGGER
ones. FULL_GC_TIMER
is also 2.1% may decrease as well.
Comment 8•5 years ago
|
||
"47% are CC_FINISHED (the new name of CC_WAITING), meaning they're idle-time. 31% are INTER_SLICE_GC, also idle time."
How do those mean idle time? They are just GC reasons, not necessarily idle time GCs, no?
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #8)
"47% are CC_FINISHED (the new name of CC_WAITING), meaning they're idle-time. 31% are INTER_SLICE_GC, also idle time."
How do those mean idle time? They are just GC reasons, not necessarily idle time GCs, no?
Sorry, I wrote that wrong. I meant "eligible to run during idle time", not guaranteed to run during idle time. This change is all about reducing the number of internal GC triggers that run GC stuff synchronously, outside of the Gecko scheduling control.
Assignee | ||
Comment 10•4 years ago
|
||
Assignee | ||
Comment 11•4 years ago
|
||
Assignee | ||
Comment 12•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 14•4 years ago
|
||
![]() |
||
Comment 15•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/04443fa474ac
https://hg.mozilla.org/mozilla-central/rev/77b0b0003ca0
https://hg.mozilla.org/mozilla-central/rev/00d93cc6125f
https://hg.mozilla.org/mozilla-central/rev/efd35f1d9bd5
https://hg.mozilla.org/mozilla-central/rev/ec817d9e8a17
https://hg.mozilla.org/mozilla-central/rev/8f4d646a4bf6
Comment 16•4 years ago
|
||
Backed out per devs request for causing Bug 1758370 . Backout link
![]() |
||
Updated•4 years ago
|
Comment 17•4 years ago
|
||
As commented in bug 1758370 comment 9, this caused telemetry regressions I requested backout from central and beta.
Comment 18•4 years ago
|
||
Backed out of beta for causing Bug 1758370
Backouts
https://hg.mozilla.org/releases/mozilla-beta/rev/6da41166b7b7
https://hg.mozilla.org/releases/mozilla-beta/rev/59289286bff4
https://hg.mozilla.org/releases/mozilla-beta/rev/f7ea3de04c38
https://hg.mozilla.org/releases/mozilla-beta/rev/ccdf852be1d3
https://hg.mozilla.org/releases/mozilla-beta/rev/9b88b9492500
https://hg.mozilla.org/releases/mozilla-beta/rev/1eae86c906d3
![]() |
||
Updated•4 years ago
|
![]() |
||
Comment 19•4 years ago
•
|
||
(In reply to Norisz Fay [:noriszfay] from comment #16)
Backed out per devs request for causing Bug 1758370 . Backout link
== Change summary for alert #33631 (as of Fri, 25 Mar 2022 11:39:03 GMT) ==
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
20% | ebay loadtime | windows10-64-shippable-qr | cold fission webrender | 473.62 -> 568.12 |
18% | nytimes fnbpaint | windows10-64-shippable-qr | cold fission webrender | 326.23 -> 384.25 |
17% | nytimes fcp | windows10-64-shippable-qr | cold fission webrender | 318.04 -> 373.50 |
7% | ebay FirstVisualChange | windows10-64-shippable-qr | cold fission webrender | 339.38 -> 364.08 |
6% | ebay fcp | windows10-64-shippable-qr | cold fission webrender | 310.15 -> 330.08 |
6% | booking dcf | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 1,074.21 -> 1,140.42 |
6% | ebay fnbpaint | windows10-64-shippable-qr | cold fission webrender | 314.98 -> 333.29 |
2% | ebay SpeedIndex | windows10-64-shippable-qr | cold fission webrender | 1,421.46 -> 1,455.67 |
2% | ebay PerceptualSpeedIndex | windows10-64-shippable-qr | cold fission webrender | 1,155.42 -> 1,181.50 |
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
17% | facebook fcp | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 627.83 -> 522.50 |
13% | instagram fcp | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 443.17 -> 384.17 |
13% | instagram fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 455.12 -> 396.04 |
12% | instagram ContentfulSpeedIndex | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 526.25 -> 462.75 |
12% | facebook fnbpaint | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 631.75 -> 557.83 |
... | ... | ... | ... | ... |
4% | booking loadtime | android-hw-g5-7-0-arm7-shippable-qr | warm webrender | 1,232.19 -> 1,185.21 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=33631
Comment 20•4 years ago
|
||
(In reply to Norisz Fay [:noriszfay] from comment #16)
Backed out per devs request for causing Bug 1758370 . Backout link
== Change summary for alert #33674 (as of Mon, 28 Mar 2022 00:48:48 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
8% | Heap Unclassified | windows10-64-2004-shippable-qr | tp6 | 81,275,135.17 -> 74,449,480.22 |
8% | Heap Unclassified | macosx1015-64-shippable-qr | tp6 | 131,240,698.12 -> 120,834,100.22 |
7% | Resident Memory | macosx1015-64-shippable-qr | tp6 | 754,350,340.64 -> 705,149,270.45 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=33674
Updated•4 years ago
|
Assignee | ||
Comment 21•4 years ago
|
||
Summary of effects: this patch series
- regresses Android warm times
- improves Windows cold times
- decreases (regresses) the percentage of zonal GCs
- increases (regresses) median GC times
- regresses heap-unclassified on OSX and Windows, and RSS on OSX
I have stared at the Android results some, but I don't really understand them. Best guess is that it's shifting around the GCs to happen during the timed region, but it's not clear-cut that that's what's going on. Those regressions are easy to reproduce in CI. Landing all of the patches, but adding on a patch that disables just the eager major GC portion, reverts the regression.
The Windows cold time (eg fcp) improvements would be great to have, though I don't know how real they are (they might be just shifting things around and thus be test artifacts as well).
The zonal GCs regression is very large. I would expect this to mean that when scheduling an eager GC, we find that all zones have reached the eager threshold? That seems unlikely. Perhaps we're hitting the cases where we force a full GC more often?
I'm not sure what to make of the median GC times. The magnitude is large enough to say that it's not good, and this should stay backed out for now. The intention of the patches is to shift more GC work towards idle time, and so one possibility is that this is a good thing: we're doing more overall GC work, but enough is shifted to idle time that we're doing less when not idle. The gc_slice_during_idle telemetry ought to tell whether that's happening, but it appears to be reporting 100% across the board. This might be erroneous, or it might be a rounding error; I'll have to look.
But it seems important to track down the zonal GC percentage difference, since that would also explain spending more time in GC.
I really don't know what to make of the memory usage changes. They should be in the other direction, if anything. Unless the GC is using a bunch of memory itself that isn't classified? That would be unexpected.
![]() |
||
Comment 22•4 years ago
|
||
(In reply to Pulsebot from comment #14)
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/04443fa474ac
Give CCGCScheduler an mEagerMajorGCReason field (set in a later patch) to
indicate that we would like to do a major GC when convenient. This is in
addition to the usual mMajorGCReason for when we definitely want to do a GC.
r=smaug
https://hg.mozilla.org/integration/autoland/rev/77b0b0003ca0
Add a CCGCScheduler.mEagerMinorGCReason field to indicate we have reached a
threshold where we would like to do a minor GC if convenient. r=smaug
https://hg.mozilla.org/integration/autoland/rev/00d93cc6125f
Rename JS::RunIdleGCTask -> JS::MaybeRunNurseryCollection and pass through
the reason, also rename IDLE_TIME_COLLECTION -> EAGER_NURSERY_COLLECTION.
r=smaug
https://hg.mozilla.org/integration/autoland/rev/efd35f1d9bd5
Move idle-time nursery GCs to a new nsJSContext::MaybePokeGC r=jonco,smaug
https://hg.mozilla.org/integration/autoland/rev/ec817d9e8a17
Make nsJSContext::MaybePokeGC trigger EAGER_ALLOC_TRIGGER major GCs instead
of just eager minor GCs r=jonco,smaug
https://hg.mozilla.org/integration/autoland/rev/8f4d646a4bf6
Remove duplicated logic in the snarl of {gcIfRequested,wantMajorGC,maybeGC}
r=jonco
== Change summary for alert #33467 (as of Sat, 05 Mar 2022 06:03:38 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
16% | nytimes FirstVisualChange | windows10-64-shippable-qr | cold fission webrender | 413.92 -> 347.42 |
13% | nytimes loadtime | windows10-64-shippable-qr | cold fission webrender | 1,793.02 -> 1,562.71 |
4% | buzzfeed PerceptualSpeedIndex | windows10-64-shippable-qr | cold fission webrender | 799.00 -> 764.92 |
4% | buzzfeed ContentfulSpeedIndex | windows10-64-shippable-qr | cold fission webrender | 871.67 -> 835.42 |
3% | buzzfeed SpeedIndex | windows10-64-shippable-qr | cold fission webrender | 757.54 -> 733.75 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=33467
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 23•3 years ago
|
||
Comment 24•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0665a477c53a
https://hg.mozilla.org/mozilla-central/rev/5e78c4752103
https://hg.mozilla.org/mozilla-central/rev/33137e7a1eeb
https://hg.mozilla.org/mozilla-central/rev/febcb8b7b78a
https://hg.mozilla.org/mozilla-central/rev/998db5a43dc2
https://hg.mozilla.org/mozilla-central/rev/d2381377a60c
Description
•