We seem to be destroying and recreating too many frames on Perf-Dashboard
Categories
(Core :: Layout, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox128 | --- | fixed |
People
(Reporter: mstange, Assigned: emilio)
References
(Blocks 2 open bugs, Regressed 1 open bug)
Details
(Keywords: perf-alert, Whiteboard: [sp3])
Attachments
(4 files, 1 obsolete file)
https://speedometer-preview.netlify.app/?suite=Perf-Dashboard
On the Speedometer 3 subtest "Perf-Dashboard", we spend too much time in mozilla::PresShell::ContentRemoved, mozilla::RestyleManager::ProcessRestyledFrames, and mozilla::ServoStyleSet::StyleDocument.
This is causing our "Update style" and "set Node.textContent" times to be 2x of Chrome's.
If we can match Chrome's time on these two buckets, our Perf-Dashboard score should improve by 6.8%.
Updated•2 years ago
|
| Reporter | ||
Comment 1•2 years ago
|
||
The hypothesis here is that we're hitting a case where we destroy more frames than strictly necessary for the changed DOM elements. For example, we might be destroying the parent frame of the changed element's frame, not just the frame + frame subtree of the changed element. As a result, we spend more time destroying frames, recreating them, and resolving their style.
I tried to create a reduced testcase which shows the perf difference but was unsuccessful. The use of WebComponents makes it a bit tricky.
| Reporter | ||
Comment 2•2 years ago
|
||
| Reporter | ||
Comment 3•2 years ago
|
||
Comment on attachment 9350939 [details]
an attempt at a reduced testcase for the textContent part
This testcase doesn't even use textContent. I'm not sure what I was thinking when I made it; I made it a few months ago.
| Reporter | ||
Comment 4•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #1)
The hypothesis here is that we're hitting a case where we destroy more frames than strictly necessary for the changed DOM elements.
This profile from the Gecko profiler corroborates that hypothesis: https://share.firefox.dev/44DT5kt
During the "SelectingRange-sync" step, there are 17 "MaybeRecreateContainerForFrameRemoval: Table or ruby pseudo parent" and 17 "MaybeRecreateContainerForFrameRemoval: IB split parent" markers.
| Reporter | ||
Comment 5•2 years ago
|
||
Hi TYLin, do you have cycles to investigate what happens in this test?
Comment 6•2 years ago
|
||
Given the Gecko profiler data in comment 4, investigating not reframing IB-split is probably a good start. I might not get to this one soon, but I'm keeping the NI as a reminder.
Comment 7•1 year ago
•
|
||
(In reply to Markus Stange [:mstange] from comment #0)
https://speedometer-preview.netlify.app/?suite=Perf-Dashboard
[...]
If we can match Chrome's time on these two buckets, our Perf-Dashboard score should improve by 6.8%.
Is this still an issue?
Running the "Perf-Dashboard" in current Nightly with 60 iterations to reduce noise, I get consistently better (higher) scores in Firefox Nightly vs. Chrome.
On Linux ThinkStation:
Firefox 126.0a1 (2024-03-25): 18.2 ± 0.33
Chrome 123.0.6312.58 (Official Build): 17 ± 0.32
If I click Details and then expand the table to look specifically at the "SelectingRange-sync" subtest referenced in comment 4, I see we're faster (lower time in ms) than Chrome on that specific subtest, too.
Firefox: SelectingRange/Sync 21.77 ± 1.19 (5.5%) ms
Chrome: SelectingRange/Sync 22.92 ± 0.33 (1.5%) ms
Maybe this has been fixed by other speedometer work that we've landed, which got us better batching of invalidations here, or something along those lines?
Comment 8•1 year ago
•
|
||
[macOS]
I ran this on macOS as well, and my results are more variable[1] there (not sure why), but the best score that I'm getting for Firefox there is 11.98 ± 0.19, vs. 8.923 ± 0.086 on Chrome. (higher is better)
In those runs, Details|submetrics shows that we're spending 30.53ms in SelectingRange/Sync, whereas Chrome is spending 42.16ms in that submetric (lower is better).
So it looks like we're winning there too.
[1] In repeated reloads, I sometimes saw scores of ~8 or even ~6, but I did get 11 several times. No idea what's leading to the lower scores, but I suspect it's independent of the original issue that spawned this bug, which was a more "we're slower on-the-whole" issue, if I understand correctly.
| Reporter | ||
Comment 9•1 year ago
•
|
||
Yes, this issue still exists.
We're currently winning on this test because we're faster at other things, for example at canvasCtx.arc(). Once Chrome catches up on those other things, we'll be slower again. Reducing our time spent in frame construction and styling to Chrome's level is still the biggest win to be had on this test - and the hope is that reconstructing fewer frames will achieve that.
Comment 10•1 year ago
|
||
Here's a pernosco recording:
https://pernos.co/debug/7tYavUofQwOIZaugFjYFtA/index.html
That has Firefox running this part of the benchmark with 2 iterations, using this URL:
https://speedometer-preview.netlify.app/?suite=Perf-Dashboard&iterationCount=2
| Assignee | ||
Comment 11•1 year ago
|
||
It seems we reframe the whole <body> when reconstructing the <div> inside the <page-component> element.
Comment 12•1 year ago
|
||
Reclassifying as "defect" for the user-facing perf impact here (though the setup is a bit pathological, and we're not like orders-of-magnitude worse than the competition on the known content that triggers this [speedometer]; hence, classifying as S3)
| Assignee | ||
Comment 13•1 year ago
|
||
That's the first slow case we hit on this page at least, which is optimizable (it's just removing in the middle of an IB-split). Daniel found that the recorded cases have a further issue which is that we're setting .textContent on an anonymous table row, we remove the anonymous wrappers, and then we add them back again.
This is something like:
<span>
<table id="build-revision"></table>
</span>
Where build-revision's .textContent changes, and we reframe the containing block.
That's all kinda silly because, in this case:
- We don't need to reconstruct the
<table>at all. We do because inserting text on a<table>creates a bunch of anonymous wrappers that we need to clean-up. In this case we could just remove the pseudo-parents. - We don't need to reconstruct the ib-split-ancestor either. Just replacing the
<table>would be fine.
But finding the right insertion point if we leave the anonymous ib-split block hanging around becomes rather tricky.
Comment 14•1 year ago
|
||
Updated•1 year ago
|
Comment 15•1 year ago
|
||
STR with testcase 1:
- Wait for testcase to load (a few seconds on my machine)
- Open devtools (F12)
- Click the button and see what gets logged to devtools.
- (optional) Click the button more times.
FIREFOX RESULTS:
The logged measurements are on the order of 700ms for me.
CHROME RESULTS:
The logged measurements are 0-1ms.
(The 700x difference isn't a really reflective of real-world content; it's just because I've made the testcase extra-pathological to make the issue easy to see.)
Comment 16•1 year ago
•
|
||
Here's a profile where I click the button in testcase 1 three two times: https://share.firefox.dev/4aCwhEI
The resulting multi-hundred-millisecond restyles are the frame reconstruction in question here.
Comment 17•1 year ago
•
|
||
(Side note: my profile in comment 16 is using 0.1ms as the sampling interval, so the sample counts might be higher than normal. Don't be fooled into thinking sample-count = milliseconds as it often would be by default.)
Here's the time spent in "set Node.textContent" in that profile of my testcase 1:
https://share.firefox.dev/3wXwtjY
There's one long burst of set Node.textContent activity for each button press, ~170ms each, nearly all of which is spent in frame destruction. (And then it's followed by a ~200ms restyle which is really just reconstructing all those frames we just destroyed, and then a bunch of time spent reflowing those freshly constructed frames, and some DevTools work to account for the fresh geometry since I had devtools open.)
If you look down the callstack of where we're spending time there, you can see there are several nested MaybeRecreateContainerForFrameRemoval, each of which indeed decides to recreate the container frame. Each level that we go down the callstack there, we're actually walking up the frame tree. Here's how our analysis goes:
- We trivially have to recreate the table contents (because its
textContentwas changed, i.e. the entire guts of the table are being replaced). - Since we're reframing the table-contents: that makes us consider whether we should recreate the table itself, and we decide that we should.
- Since we're (temporarily) removing the table: that makes us consider whether we need to recreate the table's containing-block, since the table is part of an IB split (and removing the table removes the need for that IB split). And we decide that we should.
Parts (2) and (3) are both silly for the reasons emilio noted in comment 13.
Comment 18•1 year ago
|
||
Note: another silly aspect here is that textContent isn't being set to anything interesting; it's being set to the empty-string as preparation for giving the table new contents, via the call to renderReplace here I think:
https://github.com/WebKit/Speedometer/blob/33dec4f3282394adbe985724be1cebe8928d87fe/resources/perf.webkit.org/public/v3/pages/chart-pane-status-view.js#L72
Inside renderReplace we set the element's textContent to the empty string:
https://github.com/WebKit/Speedometer/blob/33dec4f3282394adbe985724be1cebe8928d87fe/resources/perf.webkit.org/public/shared/common-component-base.js#L9
...before appending the passed-in new content.
I'm not sure if there are any heuristics that might fall out of that, to make this more straightforward... just noting it for the record, anyway. e.g. maybe we could detect that we're entirely-emptying-the-table, and land on an intermediate state where we leave an empty table frame (or empty table-wrapper frame) in-place, without any content? I don't recall the exact invariants around tables & if either of those might be a valid state to land in...
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 19•1 year ago
|
||
| Assignee | ||
Comment 20•1 year ago
|
||
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8&newProject=try&newRevision=4d866497b7a70b3c2bae3b05493dabb854562664
*******************************************************
* 2 commits/try-runs are created... *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=4d866497b7a70b3c2bae3b05493dabb854562664
| Assignee | ||
Comment 21•1 year ago
|
||
| Assignee | ||
Comment 22•1 year ago
|
||
I need more retriggers to get more confidence on the try run in comment 20, but retriggers are busted at the moment (see #sheriffs). The numbers look in the right general direction tho.
Markus, on a local build from above, I don't see any meaningful frame destruction taking more time, can you confirm if you have the time? I don't get symbols from a build from try in the profiler...
| Reporter | ||
Comment 23•1 year ago
|
||
I'll get a comparison report with the try build on Monday.
| Assignee | ||
Comment 24•1 year ago
|
||
I had to implement a more aggressive optimization because we were still hitting the slow path in the benchmark. It was a bit more work but not too much, and all the layout/reftests/table-anonymous-boxes tests and ruby tests still pass...
| Assignee | ||
Comment 25•1 year ago
|
||
Updated try run: https://treeherder.mozilla.org/jobs?repo=try&revision=31d382a20054d60727e23205c49a4b13cb894a01
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8&newProject=try&newRevision=1ffda75b1f16166a0de6ddb43b4475c07050076e
*******************************************************
* 2 commits/try-runs are created... *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=1ffda75b1f16166a0de6ddb43b4475c07050076e
If you need any help, you can find us in the #perf-help Matrix channel:
https://matrix.to/#/#perf-help:mozilla.org
For more information on the performance tests, see our PerfDocs here:
https://firefox-source-docs.mozilla.org/testing/perfdocs/
| Assignee | ||
Comment 26•1 year ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #24)
I had to implement a more aggressive optimization because we were still hitting the slow path in the benchmark. It was a bit more work but not too much, and all the
layout/reftests/table-anonymous-boxestests and ruby tests still pass...
After the patch the slow code-path is gone. We still do hit other slow paths related to IB-splits, and we hit this on insertion a lot more. So maybe there's more to fix. But I think the patch as-is is still worth landing.
| Assignee | ||
Comment 27•1 year ago
|
||
Let's see what try says, I do see less frame reconstruction in the profile at a glance, but whether it's less enough...
| Reporter | ||
Comment 28•1 year ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #22)
I don't get symbols from a build from try in the profiler...
You need to trigger the build-linux64-shippable/opt-upload-symbols job for this, and then profile the shippable build. I've triggered it on your pushes now.
| Assignee | ||
Comment 29•1 year ago
|
||
This is also actually needed to avoid the perf cliff in Perf-Dashboard.
| Assignee | ||
Comment 30•1 year ago
|
||
Ok, with comment 29 perfherder is much happier (link).
Try: https://treeherder.mozilla.org/jobs?repo=try&revision=327345f3cddb74a1f0942fed227bd29eafb43026
Comment 31•1 year ago
|
||
Note, I added a mochitest for this in bug 1900278, using the framesConstructed API to check that IB splits don't get additional frames constructed.
The two patches here each make us pass different pieces of that test. With the patches here combined, we pass the whole test, except for one sub-piece which I've marked as an expected-failure and filed bug 1900279 to cover.
| Assignee | ||
Comment 32•1 year ago
|
||
My previous patch had a typo, new try:
https://treeherder.mozilla.org/jobs?repo=try&revision=0b168e14a569306e5dba56a5dbd5e084397f37c2
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=45a539e153a7b515bcc0ec1b4a045766f069f123&newProject=try&newRevision=ddacb492558c22014a9eeedee0cf59e56c2c9992
*******************************************************
* 2 commits/try-runs are created... *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=45a539e153a7b515bcc0ec1b4a045766f069f123
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=ddacb492558c22014a9eeedee0cf59e56c2c9992
If you need any help, you can find us in the #perf-help Matrix channel:
https://matrix.to/#/#perf-help:mozilla.org
For more information on the performance tests, see our PerfDocs here:
https://firefox-source-docs.mozilla.org/testing/perfdocs/
Comment 34•1 year ago
|
||
Comment 35•1 year ago
•
|
||
Backed out for causing wpt failures on computed-style.html
[task 2024-06-04T01:04:26.713Z] 01:04:26 INFO - TEST-START | /html/rendering/widgets/button-layout/computed-style.html
[task 2024-06-04T01:04:26.713Z] 01:04:26 INFO - Closing window 1dea5243-ed5a-4ac5-814b-869726bb233c
[task 2024-06-04T01:04:26.902Z] 01:04:26 INFO - PID 1699 | [Child 1711, Main Thread] ###!!! ASSERTION: inserting after sibling frame with different parent: '!aPrevFrame || aPrevFrame->GetParent() == this', file /builds/worker/checkouts/gecko/layout/tables/nsTableFrame.cpp:2103
[task 2024-06-04T01:04:26.991Z] 01:04:26 INFO - Got content assert count 1
[task 2024-06-04T01:04:27.046Z] 01:04:27 INFO - TEST-UNEXPECTED-FAIL | /html/rendering/widgets/button-layout/computed-style.html | assertion count 1 is more than expected 0 assertions
[task 2024-06-04T01:04:27.047Z] 01:04:27 INFO - .......................................................................................................................................................................
[task 2024-06-04T01:04:27.047Z] 01:04:27 INFO - TEST-OK | /html/rendering/widgets/button-layout/computed-style.html | took 331ms
[task 2024-06-04T01:04:27.048Z] 01:04:27 INFO - TEST-START | /html/rendering/widgets/button-layout/display-other.html
| Assignee | ||
Updated•1 year ago
|
Comment 36•1 year ago
|
||
Comment 37•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/b7af4ec340b0
https://hg.mozilla.org/mozilla-central/rev/e7f116bf080c
Comment 38•1 year ago
|
||
(In reply to Pulsebot from comment #34)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f5e711b42ae
Teach the frame constructor to clean up anonymous wrappers. r=dholbert
https://hg.mozilla.org/integration/autoland/rev/d5e50086ff48
Don't reframe containing block on insertion when empty. r=jwatt
Perfherder has detected a browsertime performance change from push d5e50086ff48b2e81664ca686b4cbb206b708b0a.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
|---|---|---|---|---|---|
| 13% | speedometer3 Perf-Dashboard/SelectingRange/Sync | android-hw-a51-11-0-aarch64-shippable-qr | webrender | 185.95 -> 161.65 | Before/After |
| 12% | speedometer3 Perf-Dashboard/SelectingRange/Sync | macosx1400-64-shippable-qr | fission webrender | 11.05 -> 9.72 | Before/After |
| 12% | speedometer3 Perf-Dashboard/SelectingRange/Sync | macosx1400-64-shippable-qr | fission webrender | 11.07 -> 9.75 | Before/After |
| 11% | speedometer3 Perf-Dashboard/SelectingRange/total | macosx1400-64-shippable-qr | fission webrender | 11.65 -> 10.33 | Before/After |
| 10% | speedometer3 Perf-Dashboard/SelectingRange/Sync | android-hw-a51-11-0-aarch64-shippable-qr | webrender | 191.57 -> 171.99 | Before/After |
| ... | ... | ... | ... | ... | ... |
| 3% | speedometer3 Perf-Dashboard/total | macosx1015-64-nightlyasrelease-qr | fission webrender | 85.73 -> 82.77 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 634
For more information on performance sheriffing please see our FAQ.
Comment 39•1 year ago
|
||
(In reply to Pulsebot from comment #36)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7af4ec340b0
Teach the frame constructor to clean up anonymous wrappers. r=dholbert
https://hg.mozilla.org/integration/autoland/rev/e7f116bf080c
Don't reframe containing block on insertion when empty. r=jwatt
Perfherder has detected a browsertime performance change from push e7f116bf080c7d9ac0fe5fe26a04779a673018ec.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
|---|---|---|---|---|---|
| 11% | speedometer3 Perf-Dashboard/SelectingRange/Sync | macosx1400-64-shippable-qr | fission webrender | 11.04 -> 9.78 | Before/After |
| 11% | speedometer3 Perf-Dashboard/SelectingRange/total | macosx1400-64-shippable-qr | fission webrender | 11.64 -> 10.38 | Before/After |
| 4% | speedometer3 Perf-Dashboard/total | macosx1400-64-shippable-qr | fission webrender | 34.12 -> 32.76 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 616
For more information on performance sheriffing please see our FAQ.
Updated•1 year ago
|
Description
•