Merge sGCTimer and sGCInterSliceRunner into a single sGCRunner
Categories
(Core :: DOM: Core & HTML, enhancement)
Tracking
()
People
(Reporter: sfink, Assigned: sfink)
References
(Regressed 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(2 files, 1 obsolete file)
This turns out to make some later potential changes easier, but it also shifts a little more scheduling logic into CCGCScheduler
which I guess is good.
Note that my performance test run (ignore all of the vismet tests; only things that have a subtests link are valid) shows a pretty large speedup in some tests, which is a little disturbing. I didn't think this should change much of anything, and if it's a real improvement rather than noise then it may make it harder to land followup changes because they might bump it back. (This isn't just a refactoring, I don't think -- the timer + runner would probably have different idle behavior, and the delay & budget & deadlines aren't always going to be the same.)
Then again, the performance results of my manual pushes never seems to resemble the post-landing behavior, so I don't want to conclude too much from those pushes.
Hm... looking at the graph of facebook-redesign fnbpaint, there really isn't any other explanation for the drop. All of the faster (150ms) points are from my pushes that include this change.
Assignee | ||
Comment 1•3 years ago
|
||
Updated•3 years ago
|
Comment 2•3 years ago
•
|
||
Hi sfink, I'd like to use your single GC timer to help make some changes in Bug 1629064.
Assignee | ||
Comment 3•3 years ago
|
||
Updated•3 years ago
|
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fa92a247020d Add an initial delay to IdleTaskRunner r=smaug
Comment 5•3 years ago
|
||
bugherder |
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7480579b4348 Merge sGCTimer + sInterSliceGCRunner -> sGCRunner r=smaug
Backout by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ae8e48f5397c Backed out changeset 7480579b4348 for abouthomecache mochitest failures CLOSED TREE
Comment 8•3 years ago
|
||
Backed out changeset 7480579b4348 (bug 1692308) for abouthomecache mochitest failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/ae8e48f5397cdb0cbc7827768a20b7465ae8e040
Failures logs:
https://treeherder.mozilla.org/logviewer?job_id=333050390&repo=autoland&lineNumber=2967
[task 2021-03-12T19:23:02.478Z] 19:23:02 INFO - TEST-START | browser/components/newtab/test/browser/abouthomecache/browser_no_cache.js
[task 2021-03-12T19:24:32.503Z] 19:24:32 INFO - TEST-INFO | started process screentopng
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - TEST-INFO | screentopng: exit 0
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Buffered messages logged at 19:23:02
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Entering test bound test_no_cache
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Test is clearing the cache
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Test has cleared the cache.
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Simulating restart of the browser
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Intentionally skipping shutdown write
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Waiting for AboutHomeStartupCacheChild to uninit
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Console message: 1615576982788 AboutHomeStartupCache TRACE Clearing the cache.
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - Console message: 1615576982788 AboutHomeStartupCache TRACE Uninitialized.
[task 2021-03-12T19:24:32.918Z] 19:24:32 INFO - AboutHomeStartupCacheChild uninitted
[task 2021-03-12T19:24:32.919Z] 19:24:32 INFO - Waiting for AboutHomeStartupCache cache entry
[task 2021-03-12T19:24:32.921Z] 19:24:32 INFO - Console message: 1615576982791 AboutHomeStartupCache TRACE Initting.
[task 2021-03-12T19:24:32.924Z] 19:24:32 INFO - Console message: 1615576982791 AboutHomeStartupCache TRACE Constructing pipes.
[task 2021-03-12T19:24:32.925Z] 19:24:32 INFO - Console message: 1615576982792 AboutHomeStartupCache TRACE Initialized.
[task 2021-03-12T19:24:32.926Z] 19:24:32 INFO - Console message: 1615576982792 AboutHomeStartupCache TRACE A privileged about content process is launching with ID 1.
[task 2021-03-12T19:24:32.926Z] 19:24:32 INFO - Console message: 1615576982792 AboutHomeStartupCache INFO Sending input streams down to content process.
[task 2021-03-12T19:24:32.927Z] 19:24:32 INFO - Buffered messages logged at 19:23:47
[task 2021-03-12T19:24:32.928Z] 19:24:32 INFO - Longer timeout required, waiting longer... Remaining timeouts: 1
[task 2021-03-12T19:24:32.935Z] 19:24:32 INFO - Buffered messages finished
[task 2021-03-12T19:24:32.935Z] 19:24:32 INFO - TEST-UNEXPECTED-FAIL | browser/components/newtab/test/browser/abouthomecache/browser_no_cache.js | Test timed out -
[task 2021-03-12T19:24:32.936Z] 19:24:32 INFO - GECKO(3040) | MEMORY STAT | vsize 2821MB | residentFast 284MB | heapAllocated 89MB
[task 2021-03-12T19:24:32.937Z] 19:24:32 INFO - TEST-OK | browser/components/newtab/test/browser/abouthomecache/browser_no_cache.js | took 90030ms
[task 2021-03-12T19:24:32.938Z] 19:24:32 INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T19:24:32.938Z] 19:24:32 INFO - TEST-UNEXPECTED-FAIL | browser/components/newtab/test/browser/abouthomecache/browser_no_cache.js | Found a tab after previous test timed out: about:home -
[task 2021-03-12T19:24:32.939Z] 19:24:32 INFO - checking window state
[task 2021-03-12T19:24:32.940Z] 19:24:32 INFO - TEST-START | browser/components/newtab/test/browser/abouthomecache/browser_no_startup_actions.js
[task 2021-03-12T19:24:32.941Z] 19:24:32 INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T19:24:32.942Z] 19:24:32 INFO - Buffered messages logged at 19:24:32
[task 2021-03-12T19:24:32.942Z] 19:24:32 INFO - Entering test bound test_no_startup_actions
[task 2021-03-12T19:24:32.943Z] 19:24:32 INFO - Console message: 1615577072633 AboutHomeStartupCache TRACE Received usage result. Success = false
[task 2021-03-12T19:24:32.944Z] 19:24:32 INFO - Test is clearing the cache
[task 2021-03-12T19:24:32.948Z] 19:24:32 INFO - Buffered messages finished
[task 2021-03-12T19:24:32.949Z] 19:24:32 INFO - TEST-UNEXPECTED-FAIL | browser/components/newtab/test/browser/abouthomecache/browser_no_startup_actions.js | Uncaught exception - at resource:///modules/BrowserGlue.jsm:5392 - TypeError: can't access property "recreate", this._cacheEntry is null
[task 2021-03-12T19:24:32.950Z] 19:24:32 INFO - Stack trace:
[task 2021-03-12T19:24:32.950Z] 19:24:32 INFO - clearCache@resource:///modules/BrowserGlue.jsm:5392:5
[task 2021-03-12T19:24:32.951Z] 19:24:32 INFO - clearCache@chrome://mochitests/content/browser/browser/components/newtab/test/browser/abouthomecache/head.js:209:25
[task 2021-03-12T19:24:32.952Z] 19:24:32 INFO - test_no_startup_actions/<@chrome://mochitests/content/browser/browser/components/newtab/test/browser/abouthomecache/browser_no_startup_actions.js:16:11
[task 2021-03-12T19:24:32.953Z] 19:24:32 INFO - withNewTab@resource://testing-common/BrowserTestUtils.jsm:148:24
[task 2021-03-12T19:24:32.953Z] 19:24:32 INFO - Leaving test bound test_no_startup_actions
[task 2021-03-12T19:24:32.954Z] 19:24:32 INFO - Console message: 1615577072895 AboutHomeStartupCache TRACE Clearing the cache.
[task 2021-03-12T19:24:32.955Z] 19:24:32 INFO - GECKO(3040) | MEMORY STAT | vsize 2821MB | residentFast 285MB | heapAllocated 90MB
[task 2021-03-12T19:24:32.955Z] 19:24:32 INFO - TEST-OK | browser/components/newtab/test/browser/abouthomecache/browser_no_startup_actions.js | took 320ms
[task 2021-03-12T19:24:32.955Z] 19:24:32 INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T19:24:32.956Z] 19:24:32 INFO - TEST-UNEXPECTED-FAIL | browser/components/newtab/test/browser/abouthomecache/browser_no_startup_actions.js | Found an unexpected tab at the end of test run: about:home -
[task 2021-03-12T19:24:32.963Z] 19:24:32 INFO - checking window state
...
...
...
https://treeherder.mozilla.org/logviewer?job_id=333054857&repo=autoland&lineNumber=4949
[task 2021-03-12T20:37:47.566Z] 20:37:47 INFO - Entering test bound testUpdateDefaultMetadataWhenPageTitleChanges
[task 2021-03-12T20:37:47.566Z] 20:37:47 INFO - open media page
[task 2021-03-12T20:37:47.567Z] 20:37:47 INFO - Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "https://example.com/browser/dom/media/mediacontrol/tests/browser/file_non_autoplay.html" line: 0}]
[task 2021-03-12T20:37:47.567Z] 20:37:47 INFO - Buffered messages logged at 20:37:47
[task 2021-03-12T20:37:47.567Z] 20:37:47 INFO - start media
[task 2021-03-12T20:37:47.567Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | video started playing - true == true -
[task 2021-03-12T20:37:47.567Z] 20:37:47 INFO - should use default metadata because of lacking of media session
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | l10n fallback title is not empty -
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - handle event=activated
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - handle event=playbackstatechange
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Using website title as a default title - "Non-Autoplay page" == "Non-Autoplay page" -
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Default metada contains one artwork -
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Using default favicon as a default art work -
[task 2021-03-12T20:37:47.568Z] 20:37:47 INFO - default metadata should be updated after page title changes
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - handle event=metadatachange
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | l10n fallback title is not empty -
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Using website title as a default title - "new title" == "new title" -
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Default metada contains one artwork -
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Using default favicon as a default art work -
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - after setting metadata, title change won't affect current metadata
[task 2021-03-12T20:37:47.569Z] 20:37:47 INFO - handle event=metadatachange
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - handle event=metadatachange
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | tile 'foo' is equal to foo -
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | artist 'bar' is equal to bar -
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | album 'foo' is equal to foo -
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | artwork length '1' is equal to 1 -
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | artwork src 'https://example.com/browser/dom/media/mediacontrol/tests/browser/bar.jpg' includes bar.jpg -
[task 2021-03-12T20:37:47.570Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | artwork sizes '128x128' is equal to 128x128 -
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | artwork type 'image/jpeg' is equal to image/jpeg -
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - remove tab
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - wait until finishing close tab wrapper
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - handle event=deactivated
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | First event should be 'activated' -
[task 2021-03-12T20:37:47.571Z] 20:37:47 INFO - TEST-PASS | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | Last event should be 'deactivated' -
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - Leaving test bound testUpdateDefaultMetadataWhenPageTitleChanges
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - Console message: [JavaScript Error: "TypeError: can't access property "audioPlaybackStopped", browser is null" {file: "resource://gre/actors/AudioPlaybackParent.jsm" line: 24}]
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - receiveMessage@resource://gre/actors/AudioPlaybackParent.jsm:24:9
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO -
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - Buffered messages finished
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - TEST-UNEXPECTED-FAIL | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort. -
[task 2021-03-12T20:37:47.572Z] 20:37:47 INFO - GECKO(1767) | MEMORY STAT | vsize 7756MB | residentFast 326MB | heapAllocated 114MB
[task 2021-03-12T20:37:47.573Z] 20:37:47 INFO - TEST-OK | dom/media/mediacontrol/tests/browser/browser_media_control_metadata.js | took 46515ms
Assignee | ||
Comment 9•3 years ago
|
||
I should have marked this [leave-open] before landing the first patch.
Updated•3 years ago
|
Comment 10•3 years ago
•
|
||
Also seeing the following starting to perma fail with the backed out changes:
https://treeherder.mozilla.org/logviewer?job_id=333067441&repo=autoland&lineNumber=21515
https://treeherder.mozilla.org/logviewer?job_id=333068005&repo=autoland&lineNumber=3706
Assignee | ||
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e1fa9dc7464b Merge sGCTimer + sInterSliceGCRunner -> sGCRunner r=smaug
Comment 12•3 years ago
|
||
bugherder |
Assignee | ||
Comment 13•3 years ago
|
||
Ugh, this ended up being something very simple. The tests involve dropping a cache entry. I had a bug where I didn't set the GC reason in all cases, which caused GCs to be skipped entirely. I think the cache dropping probably relied on things being collected.
Comment 14•3 years ago
|
||
backout |
Backed out for causing bug 1700313.
https://hg.mozilla.org/integration/autoland/rev/67a0bd10d0f819b17feccfb5577c9bc213ddb265
Comment 15•3 years ago
|
||
backout |
And also from Beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/c8f878c8ba1d
Comment 16•3 years ago
|
||
Backout merged: https://hg.mozilla.org/mozilla-central/rev/67a0bd10d0f8
Comment 17•3 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)
Backed out for causing bug 1700313.
https://hg.mozilla.org/integration/autoland/rev/67a0bd10d0f819b17feccfb5577c9bc213ddb265
== Change summary for alert #29546 (as of Fri, 02 Apr 2021 03:14:48 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
2% | startup_about_home_paint_realworld_webextensions | startup_about_home_paint_realworld_webextensions | macosx1015-64-shippable | e10s stylo | 813.58 -> 831.83 |
2% | startup_about_home_paint_realworld_webextensions | startup_about_home_paint_realworld_webextensions | macosx1015-64-shippable | e10s stylo | 811.79 -> 828.08 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29546
Comment 18•3 years ago
|
||
NI'ing sfink since his patches have been backed out here.
Updated•3 years ago
|
Comment 19•3 years ago
|
||
The investigations into the perf regression are actually happening in the closed bug 1700313, for anyone who might be following this bug.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
Hi Smaug, in case you missed it sfink and I would like to know if we should land this even though it has some perf regressions (it has some improvements too). (sfink asked here: https://phabricator.services.mozilla.com/D112928#3673881 but that may have gotten lost).
Comment 21•3 years ago
|
||
I did comment in phabricator and discussed with sfink on matrix couple of days ago.
As far as I see, sScheduler.SetWantMajorGC(aReason); could cause some issue here, and that is a bug anyhow in the patch.
Could we try to not have that call when PokeGC returns early?
Assignee | ||
Comment 22•3 years ago
|
||
Yeah, I tried testing that change but I'm running into a weird problem during the packaging stage on OSX: https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=b61vkcq6Tme-TTzCthLe4w.0&tier=1%2C2%2C3&author=sfink%40mozilla.com&fromchange=f909e5a735f6852ef286c97b322aea7c12618c18
I assumed I had a bad base, but this is still happening after rebasing.
Comment 23•3 years ago
|
||
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e6111bb1d3fe Merge sGCTimer + sInterSliceGCRunner -> sGCRunner
Assignee | ||
Comment 24•3 years ago
|
||
Ok, another rebase did it. And the push seems to have resolved the regression? It was pretty reproducible before.
We'll see what happens this time!
Comment 25•3 years ago
|
||
w00t!
Comment 26•3 years ago
|
||
bugherder |
Comment 27•3 years ago
|
||
This has caused some interesting changes to telemetry for the GC reason. The changes are visible for the initial landing and revert on the backout, and then reappear with the relanding.
There is a large increase in the INTER_SLICE_GC reason and fall in the CC_FINISHED reason. There's also a drop in PAGE_HIDE and MEM_PRESSURE reasons.
I didn't see any performance related changes that correlate with the bug so I don't think there's a problem but I wanted to check if this was expected or might indicate some potential issue?
I was investigating this because the changes on the backout are close in time to some other telemetry changes, but I don't think they're related.
Comment 28•3 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #27)
I didn't see any performance related changes that correlate with the bug so I don't think there's a problem but I wanted to check if this was expected or might indicate some potential issue?
These patches did cause a memory regression on AWSY, so maybe the telemetry change provides an explanation for what changed to cause that.
Updated•3 years ago
|
Assignee | ||
Comment 29•3 years ago
|
||
The GC reason changes are at least somewhat expected, but I think I'll want to redo this patch to avoid them (at least for the most part) because this patch is still causing browsertime imgur regressions on OSX that I'm still trying to fully diagnose, and the reason differences are very much getting in the way. I will try to separate out the reason changes into a different patch if I decide I still want them.
Assignee | ||
Comment 30•3 years ago
|
||
This patch makes two changes:
- Bug 1692308 made some changes to how GCReasons are reported.
Previously, the reasons reported for slices depended on how the GC was
triggered: internally triggered GCs would have the internal reason for their
first slice and INTER_SLICE_GC for every later slice. PokeGC-triggered GCs
would use the PokeGC reason for all slices. (Or if RunNextCollectorTimer was
called before the GC began, its reason would get swapped in for the PokeGC
one.)
After bug 1692308, INTER_SLICE_GC was used for all non-initial slices. This
change wasn't entirely intentional (part of it was).
This patch reverts to the previous behavior, and in so doing adjusts the use of
sScheduler.mMajorGCReason (it maintains the reason that we want to use for
later slices, rather than clearing it when we get the go-ahead from the
parent.)
Note that this handling of GCReasons is intended to be temporary. I want to fix
it properly in a later bug.
- Make it so internally-initiated GCs don't need to wait for the parent to
allow them to continue.
Comment 31•3 years ago
|
||
Comment on attachment 9221865 [details]
Revert to pre-1692308 GCReason handling, and allow internally-initiated GCs to proceed without waiting for permission from parent process
Revision D115096 was moved to bug 1711128. Setting attachment 9221865 [details] to obsolete.
Assignee | ||
Comment 32•3 years ago
|
||
I reverted the GCReason changes as best I could (pbone has some patches on top that made further changes, but it should be pretty much back to what it was.)
I still intend to make the changes that you observed, but I moved that to bug 1711135.
Description
•