736.8 - 406.99% perf_reftest_singletons remove-children.html / perf_reftest_singletons remove-children.html + 2 more (Linux, OSX, Windows) regression on Mon June 3 2024
Categories
(Core :: Layout, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox127 | --- | unaffected |
| firefox128 | --- | wontfix |
| firefox129 | --- | wontfix |
People
(Reporter: aglavic, Assigned: emilio)
References
(Regression)
Details
(4 keywords)
Perfherder has detected a talos performance regression from push d5e50086ff48b2e81664ca686b4cbb206b708b0a. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 737% | perf_reftest_singletons remove-children.html | macosx1015-64-shippable-qr | e10s fission stylo webrender | 22.09 -> 184.84 |
| 697% | perf_reftest_singletons remove-children.html | windows10-64-shippable-qr | e10s fission stylo webrender | 22.14 -> 176.39 |
| 546% | perf_reftest_singletons remove-children.html | linux1804-64-shippable-qr | e10s fission stylo webrender | 29.84 -> 192.78 |
| 407% | perf_reftest_singletons remove-children.html | linux1804-64-shippable-qr | e10s fission stylo webrender | 29.77 -> 150.93 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 650
For more information on performance sheriffing please see our FAQ.
Comment 1•1 year ago
|
||
Here's this "remove-children.html" microbenchmark (added by dshin in bug 1873649):
https://searchfox.org/mozilla-central/rev/10e7fbf7a5a46e10b83becff0f16ffdedc9abe3d/testing/talos/talos/tests/perf-reftest-singletons/remove-children.html
It looks like it builds up a DOM of-the-form:
<table>
<tr></tr>
<tr></tr>
<tr></tr>
[...]
...and removes the 4000 tr elements starting from the front (following a pattern observed in-the-wild based on a code comment in the test).
This seems like behavior that would indeed be impacted by the regressor, bug 1850834, which was targeted at improving the performance of table-part-removal. It's surprising but not-inconceivable that it hurt perf in some such cases.
We should definitely get to the bottom of this.
Comment 2•1 year ago
|
||
(random guess: maybe before this regressed, we were avoiding recreating frames in response to the tr removals, whereas now we're recreating frames on each removal?)
| Assignee | ||
Comment 3•1 year ago
|
||
Most likely we're just re-creating the new lines at the end while before that we weren't, probably due to https://hg.mozilla.org/integration/autoland/rev/8f5e711b42ae6770d416aeeb93a76d74092f89a1#l1.244
Comment 4•1 year ago
|
||
Set release status flags based on info from the regressing bug 1850834
Comment 5•1 year ago
|
||
Emilio, can we assign you to this?
| Assignee | ||
Comment 7•1 year ago
|
||
You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=05a5b96f177fe1acb1e6902ab4616b9266909c7c&newProject=try&newRevision=f04b44e1891bc3b3ebd95a5db91d0ccd45126b11
| Assignee | ||
Comment 8•1 year ago
|
||
Ah, so I had misdiagnosed, and I don't think comment 7 would help. The TLDR is that it's not quite a regression from my change (as in, it is on this particular test-case, but not more generally).
In particular, I can reproduce the regression if I add:
| Assignee | ||
Comment 9•1 year ago
|
||
Err, if I add:
diff --git a/testing/talos/talos/tests/perf-reftest-singletons/remove-children.html b/testing/talos/talos/tests/perf-reftest-singletons/remove-children.html
index 4adf9f49ab512..f6a8ff9d4f12f 100644
--- a/testing/talos/talos/tests/perf-reftest-singletons/remove-children.html
+++ b/testing/talos/talos/tests/perf-reftest-singletons/remove-children.html
@@ -1,5 +1,6 @@
<!DOCTYPE html>
-<table id="dut">
+<table>
+ <tbody id="dut"></tbody>
</table>
<script src="util.js"></script>
<script>
To the test-case.
So, few things:
- The main issue seems to be quadratic behavior removing the first element from this array here
- The regression isn't terribly major (10ms -> 100/200ms on my machine, but for a 40k row table well...). Definitely not the kind of issue that we were seeing in the bug that introduced the test.
- We could optimize this case better by using a deque, I guess, but it's not clear if we'd win much.
Thoughts Daniel? In any case I'm pretty sure this is not an S2.
Comment 10•1 year ago
•
|
||
Just making sure I understand correctly: it sounds like you're saying that, with the test-tweak in comment 9 applied, then our recent frame-constructor change (bug 1873649) doesn't impact perf on this testcase anymore (because we're already "slow" on that thusly-modified testcase in builds from before bug 1873649). Is that what you're saying?
If so, I agree that this becomes a bit less worrisome and maybe inactionable, given the other bullet points from comment 3.
(I guess this is a case where blowing away a structure and rebuilding it lazily (after the test does a zillion incremental updates) turns out to be faster than trying to keep that structure alive and valid with each incremental update. But yeah, that's probably not a use-case that we want to optimize for, since it's more common for sites to make smaller surgical changes and we'd prefer to keep those faster.)
Comment 11•1 year ago
|
||
In other words: it sounds like this was a case where making a zillion incremental changes is slower than destroying the whole table and rebuilding what's left when the dust settles. Before bug 1873649, we would eagerly destroy the whole table for silly reasons, which happened to make this particular case "fast". Now we don't destroy it as eagerly, which means we actually get an opportunity to make the incremental changes in-place, which turns out to be slower on-the-whole since the testcase happens to do so many incremental changes.
I'm not overly worried about that.
Comment 12•1 year ago
|
||
Downgrading to S3, though if I'm understanding correctly, we might consider this WONTFIX, since the old behavior was arguably inadvertently-overoptimized-to-benefit-this-specific-case (at the cost of making more common surgical updates slower).
| Assignee | ||
Comment 13•1 year ago
|
||
Your understanding is correct. We can somewhat-easily optimize the "remove first row" by using a deque or something, but making that fast and removing the second slow seems not great. So yeah I lean towards wontfix too...
Updated•1 year ago
|
Description
•