Closed Bug 1850834 Opened 2 years ago Closed 1 year ago

We seem to be destroying and recreating too many frames on Perf-Dashboard

Categories

(Core :: Layout, defect)

defect

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox128 --- fixed

People

(Reporter: mstange, Assigned: emilio)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

(Keywords: perf-alert, Whiteboard: [sp3])

Attachments

(4 files, 1 obsolete file)

https://speedometer-preview.netlify.app/?suite=Perf-Dashboard

On the Speedometer 3 subtest "Perf-Dashboard", we spend too much time in mozilla::PresShell::ContentRemoved, mozilla::RestyleManager::ProcessRestyledFrames, and mozilla::ServoStyleSet::StyleDocument.

This is causing our "Update style" and "set Node.textContent" times to be 2x of Chrome's.

If we can match Chrome's time on these two buckets, our Perf-Dashboard score should improve by 6.8%.

The hypothesis here is that we're hitting a case where we destroy more frames than strictly necessary for the changed DOM elements. For example, we might be destroying the parent frame of the changed element's frame, not just the frame + frame subtree of the changed element. As a result, we spend more time destroying frames, recreating them, and resolving their style.

I tried to create a reduced testcase which shows the perf difference but was unsuccessful. The use of WebComponents makes it a bit tricky.

Comment on attachment 9350939 [details]
an attempt at a reduced testcase for the textContent part

This testcase doesn't even use textContent. I'm not sure what I was thinking when I made it; I made it a few months ago.

Attachment #9350939 - Attachment is obsolete: true

(In reply to Markus Stange [:mstange] from comment #1)

The hypothesis here is that we're hitting a case where we destroy more frames than strictly necessary for the changed DOM elements.

This profile from the Gecko profiler corroborates that hypothesis: https://share.firefox.dev/44DT5kt

During the "SelectingRange-sync" step, there are 17 "MaybeRecreateContainerForFrameRemoval: Table or ruby pseudo parent" and 17 "MaybeRecreateContainerForFrameRemoval: IB split parent" markers.

Blocks: 1856671

Hi TYLin, do you have cycles to investigate what happens in this test?

Flags: needinfo?(aethanyc)

Given the Gecko profiler data in comment 4, investigating not reframing IB-split is probably a good start. I might not get to this one soon, but I'm keeping the NI as a reminder.

(In reply to Markus Stange [:mstange] from comment #0)

https://speedometer-preview.netlify.app/?suite=Perf-Dashboard
[...]
If we can match Chrome's time on these two buckets, our Perf-Dashboard score should improve by 6.8%.

Is this still an issue?

Running the "Perf-Dashboard" in current Nightly with 60 iterations to reduce noise, I get consistently better (higher) scores in Firefox Nightly vs. Chrome.

On Linux ThinkStation:
Firefox 126.0a1 (2024-03-25): 18.2 ± 0.33
Chrome 123.0.6312.58 (Official Build): 17 ± 0.32

If I click Details and then expand the table to look specifically at the "SelectingRange-sync" subtest referenced in comment 4, I see we're faster (lower time in ms) than Chrome on that specific subtest, too.
Firefox: SelectingRange/Sync 21.77 ± 1.19 (5.5%) ms
Chrome: SelectingRange/Sync 22.92 ± 0.33 (1.5%) ms

Maybe this has been fixed by other speedometer work that we've landed, which got us better batching of invalidations here, or something along those lines?

Flags: needinfo?(mstange.moz)

[macOS]
I ran this on macOS as well, and my results are more variable[1] there (not sure why), but the best score that I'm getting for Firefox there is 11.98 ± 0.19, vs. 8.923 ± 0.086 on Chrome. (higher is better)

In those runs, Details|submetrics shows that we're spending 30.53ms in SelectingRange/Sync, whereas Chrome is spending 42.16ms in that submetric (lower is better).

So it looks like we're winning there too.

[1] In repeated reloads, I sometimes saw scores of ~8 or even ~6, but I did get 11 several times. No idea what's leading to the lower scores, but I suspect it's independent of the original issue that spawned this bug, which was a more "we're slower on-the-whole" issue, if I understand correctly.

Yes, this issue still exists.

We're currently winning on this test because we're faster at other things, for example at canvasCtx.arc(). Once Chrome catches up on those other things, we'll be slower again. Reducing our time spent in frame construction and styling to Chrome's level is still the biggest win to be had on this test - and the hope is that reconstructing fewer frames will achieve that.

Flags: needinfo?(mstange.moz)

Here's a pernosco recording:
https://pernos.co/debug/7tYavUofQwOIZaugFjYFtA/index.html

That has Firefox running this part of the benchmark with 2 iterations, using this URL:
https://speedometer-preview.netlify.app/?suite=Perf-Dashboard&iterationCount=2

Attached file Relevant debug log

It seems we reframe the whole <body> when reconstructing the <div> inside the <page-component> element.

Reclassifying as "defect" for the user-facing perf impact here (though the setup is a bit pathological, and we're not like orders-of-magnitude worse than the competition on the known content that triggers this [speedometer]; hence, classifying as S3)

Severity: -- → S3
Type: task → defect

That's the first slow case we hit on this page at least, which is optimizable (it's just removing in the middle of an IB-split). Daniel found that the recorded cases have a further issue which is that we're setting .textContent on an anonymous table row, we remove the anonymous wrappers, and then we add them back again.

This is something like:

<span>
<table id="build-revision"></table>
</span>

Where build-revision's .textContent changes, and we reframe the containing block.

That's all kinda silly because, in this case:

  • We don't need to reconstruct the <table> at all. We do because inserting text on a <table> creates a bunch of anonymous wrappers that we need to clean-up. In this case we could just remove the pseudo-parents.
  • We don't need to reconstruct the ib-split-ancestor either. Just replacing the <table> would be fine.

But finding the right insertion point if we leave the anonymous ib-split block hanging around becomes rather tricky.

Attachment #9404909 - Attachment description: testcase 1 (microbenchmark for mutating textContent of a table that forms an IB split with an expensive-to-recreate neighbor → testcase 1 (microbenchmark for mutating textContent of a table that forms an IB split with an expensive-to-recreate neighbor; measurements in console.log

STR with testcase 1:

  1. Wait for testcase to load (a few seconds on my machine)
  2. Open devtools (F12)
  3. Click the button and see what gets logged to devtools.
  4. (optional) Click the button more times.

FIREFOX RESULTS:
The logged measurements are on the order of 700ms for me.

CHROME RESULTS:
The logged measurements are 0-1ms.

(The 700x difference isn't a really reflective of real-world content; it's just because I've made the testcase extra-pathological to make the issue easy to see.)

Here's a profile where I click the button in testcase 1 three two times: https://share.firefox.dev/4aCwhEI

The resulting multi-hundred-millisecond restyles are the frame reconstruction in question here.

(Side note: my profile in comment 16 is using 0.1ms as the sampling interval, so the sample counts might be higher than normal. Don't be fooled into thinking sample-count = milliseconds as it often would be by default.)

Here's the time spent in "set Node.textContent" in that profile of my testcase 1:
https://share.firefox.dev/3wXwtjY

There's one long burst of set Node.textContent activity for each button press, ~170ms each, nearly all of which is spent in frame destruction. (And then it's followed by a ~200ms restyle which is really just reconstructing all those frames we just destroyed, and then a bunch of time spent reflowing those freshly constructed frames, and some DevTools work to account for the fresh geometry since I had devtools open.)

If you look down the callstack of where we're spending time there, you can see there are several nested MaybeRecreateContainerForFrameRemoval, each of which indeed decides to recreate the container frame. Each level that we go down the callstack there, we're actually walking up the frame tree. Here's how our analysis goes:

  1. We trivially have to recreate the table contents (because its textContent was changed, i.e. the entire guts of the table are being replaced).
  2. Since we're reframing the table-contents: that makes us consider whether we should recreate the table itself, and we decide that we should.
  3. Since we're (temporarily) removing the table: that makes us consider whether we need to recreate the table's containing-block, since the table is part of an IB split (and removing the table removes the need for that IB split). And we decide that we should.

Parts (2) and (3) are both silly for the reasons emilio noted in comment 13.

Note: another silly aspect here is that textContent isn't being set to anything interesting; it's being set to the empty-string as preparation for giving the table new contents, via the call to renderReplace here I think:
https://github.com/WebKit/Speedometer/blob/33dec4f3282394adbe985724be1cebe8928d87fe/resources/perf.webkit.org/public/v3/pages/chart-pane-status-view.js#L72

Inside renderReplace we set the element's textContent to the empty string:
https://github.com/WebKit/Speedometer/blob/33dec4f3282394adbe985724be1cebe8928d87fe/resources/perf.webkit.org/public/shared/common-component-base.js#L9

...before appending the passed-in new content.

I'm not sure if there are any heuristics that might fall out of that, to make this more straightforward... just noting it for the record, anyway. e.g. maybe we could detect that we're entirely-emptying-the-table, and land on an intermediate state where we leave an empty table frame (or empty table-wrapper frame) in-place, without any content? I don't recall the exact invariants around tables & if either of those might be a valid state to land in...

Assignee: nobody → emilio
Flags: needinfo?(aethanyc)
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8&newProject=try&newRevision=4d866497b7a70b3c2bae3b05493dabb854562664


*******************************************************
*          2 commits/try-runs are created...          *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=4d866497b7a70b3c2bae3b05493dabb854562664

I need more retriggers to get more confidence on the try run in comment 20, but retriggers are busted at the moment (see #sheriffs). The numbers look in the right general direction tho.

Markus, on a local build from above, I don't see any meaningful frame destruction taking more time, can you confirm if you have the time? I don't get symbols from a build from try in the profiler...

Flags: needinfo?(mstange.moz)

I'll get a comparison report with the try build on Monday.

I had to implement a more aggressive optimization because we were still hitting the slow path in the benchmark. It was a bit more work but not too much, and all the layout/reftests/table-anonymous-boxes tests and ruby tests still pass...

Updated try run: https://treeherder.mozilla.org/jobs?repo=try&revision=31d382a20054d60727e23205c49a4b13cb894a01

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8&newProject=try&newRevision=1ffda75b1f16166a0de6ddb43b4475c07050076e


*******************************************************
*          2 commits/try-runs are created...          *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=69b3fd5419a059f60ce5a7a77a640a050f43b1a8
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=1ffda75b1f16166a0de6ddb43b4475c07050076e

If you need any help, you can find us in the #perf-help Matrix channel:
https://matrix.to/#/#perf-help:mozilla.org

For more information on the performance tests, see our PerfDocs here:
https://firefox-source-docs.mozilla.org/testing/perfdocs/

(In reply to Emilio Cobos Álvarez (:emilio) from comment #24)

I had to implement a more aggressive optimization because we were still hitting the slow path in the benchmark. It was a bit more work but not too much, and all the layout/reftests/table-anonymous-boxes tests and ruby tests still pass...

After the patch the slow code-path is gone. We still do hit other slow paths related to IB-splits, and we hit this on insertion a lot more. So maybe there's more to fix. But I think the patch as-is is still worth landing.

Let's see what try says, I do see less frame reconstruction in the profile at a glance, but whether it's less enough...

(In reply to Emilio Cobos Álvarez (:emilio) from comment #22)

I don't get symbols from a build from try in the profiler...

You need to trigger the build-linux64-shippable/opt-upload-symbols job for this, and then profile the shippable build. I've triggered it on your pushes now.

This is also actually needed to avoid the perf cliff in Perf-Dashboard.

Depends on: 1900278

Note, I added a mochitest for this in bug 1900278, using the framesConstructed API to check that IB splits don't get additional frames constructed.

The two patches here each make us pass different pieces of that test. With the patches here combined, we pass the whole test, except for one sub-piece which I've marked as an expected-failure and filed bug 1900279 to cover.

My previous patch had a typo, new try:

https://treeherder.mozilla.org/jobs?repo=try&revision=0b168e14a569306e5dba56a5dbd5e084397f37c2

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=45a539e153a7b515bcc0ec1b4a045766f069f123&newProject=try&newRevision=ddacb492558c22014a9eeedee0cf59e56c2c9992


*******************************************************
*          2 commits/try-runs are created...          *
*******************************************************
Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=45a539e153a7b515bcc0ec1b4a045766f069f123
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=ddacb492558c22014a9eeedee0cf59e56c2c9992

If you need any help, you can find us in the #perf-help Matrix channel:
https://matrix.to/#/#perf-help:mozilla.org

For more information on the performance tests, see our PerfDocs here:
https://firefox-source-docs.mozilla.org/testing/perfdocs/

(Markus answered on Matrix)

Flags: needinfo?(mstange.moz)
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8f5e711b42ae Teach the frame constructor to clean up anonymous wrappers. r=dholbert https://hg.mozilla.org/integration/autoland/rev/d5e50086ff48 Don't reframe containing block on insertion when empty. r=jwatt

Backed out for causing wpt failures on computed-style.html

[task 2024-06-04T01:04:26.713Z] 01:04:26     INFO - TEST-START | /html/rendering/widgets/button-layout/computed-style.html
[task 2024-06-04T01:04:26.713Z] 01:04:26     INFO - Closing window 1dea5243-ed5a-4ac5-814b-869726bb233c
[task 2024-06-04T01:04:26.902Z] 01:04:26     INFO - PID 1699 |  [Child 1711, Main Thread] ###!!! ASSERTION: inserting after sibling frame with different parent: '!aPrevFrame || aPrevFrame->GetParent() == this', file /builds/worker/checkouts/gecko/layout/tables/nsTableFrame.cpp:2103
[task 2024-06-04T01:04:26.991Z] 01:04:26     INFO - Got content assert count 1
[task 2024-06-04T01:04:27.046Z] 01:04:27     INFO - TEST-UNEXPECTED-FAIL | /html/rendering/widgets/button-layout/computed-style.html | assertion count 1 is more than expected 0 assertions
[task 2024-06-04T01:04:27.047Z] 01:04:27     INFO - .......................................................................................................................................................................
[task 2024-06-04T01:04:27.047Z] 01:04:27     INFO - TEST-OK | /html/rendering/widgets/button-layout/computed-style.html | took 331ms
[task 2024-06-04T01:04:27.048Z] 01:04:27     INFO - TEST-START | /html/rendering/widgets/button-layout/display-other.html
Flags: needinfo?(emilio)
Flags: needinfo?(emilio)
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b7af4ec340b0 Teach the frame constructor to clean up anonymous wrappers. r=dholbert https://hg.mozilla.org/integration/autoland/rev/e7f116bf080c Don't reframe containing block on insertion when empty. r=jwatt
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch

(In reply to Pulsebot from comment #34)

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f5e711b42ae
Teach the frame constructor to clean up anonymous wrappers. r=dholbert
https://hg.mozilla.org/integration/autoland/rev/d5e50086ff48
Don't reframe containing block on insertion when empty. r=jwatt

Perfherder has detected a browsertime performance change from push d5e50086ff48b2e81664ca686b4cbb206b708b0a.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
13% speedometer3 Perf-Dashboard/SelectingRange/Sync android-hw-a51-11-0-aarch64-shippable-qr webrender 185.95 -> 161.65 Before/After
12% speedometer3 Perf-Dashboard/SelectingRange/Sync macosx1400-64-shippable-qr fission webrender 11.05 -> 9.72 Before/After
12% speedometer3 Perf-Dashboard/SelectingRange/Sync macosx1400-64-shippable-qr fission webrender 11.07 -> 9.75 Before/After
11% speedometer3 Perf-Dashboard/SelectingRange/total macosx1400-64-shippable-qr fission webrender 11.65 -> 10.33 Before/After
10% speedometer3 Perf-Dashboard/SelectingRange/Sync android-hw-a51-11-0-aarch64-shippable-qr webrender 191.57 -> 171.99 Before/After
... ... ... ... ... ...
3% speedometer3 Perf-Dashboard/total macosx1015-64-nightlyasrelease-qr fission webrender 85.73 -> 82.77 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 634

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert
Regressions: 1901213

(In reply to Pulsebot from comment #36)

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7af4ec340b0
Teach the frame constructor to clean up anonymous wrappers. r=dholbert
https://hg.mozilla.org/integration/autoland/rev/e7f116bf080c
Don't reframe containing block on insertion when empty. r=jwatt

Perfherder has detected a browsertime performance change from push e7f116bf080c7d9ac0fe5fe26a04779a673018ec.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
11% speedometer3 Perf-Dashboard/SelectingRange/Sync macosx1400-64-shippable-qr fission webrender 11.04 -> 9.78 Before/After
11% speedometer3 Perf-Dashboard/SelectingRange/total macosx1400-64-shippable-qr fission webrender 11.64 -> 10.38 Before/After
4% speedometer3 Perf-Dashboard/total macosx1400-64-shippable-qr fission webrender 34.12 -> 32.76 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 616

For more information on performance sheriffing please see our FAQ.

Regressions: 1901321
Regressions: 1902156
Regressions: 1901233
Regressions: 1902540
Regressions: 1907289
Regressions: 1907789
Regressed by: 1909381
No longer regressed by: 1909381
Regressions: 1909381
Duplicate of this bug: 1474818
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: