12.12 - 2.16% speedometer3 Charts-observable-plot/Dotted/Sync / speedometer3 Charts-observable-plot/total + 20 more (OSX, Windows) regression on Mon September 16 2024
Categories
(Core :: Disability Access APIs, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | unaffected |
firefox130 | --- | unaffected |
firefox131 | --- | unaffected |
firefox132 | --- | fixed |
firefox133 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: eeejay)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: perf, perf-alert, regression, Whiteboard: [sp3])
Attachments
(1 file)
Perfherder has detected a browsertime performance regression from push d684010261d6886b7cae29c5f7299f4b7d48afca. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
12% | speedometer3 Charts-observable-plot/Dotted/Sync | windows11-64-shippable-qr | fission webrender | 7.55 -> 8.46 | Before/After |
10% | speedometer3 Charts-observable-plot/Dotted/Sync | windows11-64-nightlyasrelease-qr | fission webrender | 7.35 -> 8.10 | Before/After |
7% | speedometer3 Charts-observable-plot/Dotted/Sync | macosx1400-64-shippable-qr | fission webrender | 4.86 -> 5.19 | Before/After |
6% | speedometer3 Charts-observable-plot/Dotted/total | windows11-64-shippable-qr | fission webrender | 12.55 -> 13.34 | Before/After |
6% | speedometer3 Charts-observable-plot/Dotted/Sync | macosx1015-64-shippable-qr | fission webrender | 11.07 -> 11.71 | Before/After |
6% | speedometer3 Charts-observable-plot/Dotted/total | windows11-64-nightlyasrelease-qr | fission webrender | 12.20 -> 12.87 | Before/After |
5% | speedometer3 Charts-observable-plot/Dotted/total | macosx1400-64-shippable-qr | fission webrender | 7.65 -> 8.01 | Before/After |
5% | stylebench Nth pseudo classes/Adding leaf elements - 4/Sync | windows11-64-shippable-qr | fission webrender | 1.32 -> 1.38 | Before/After |
4% | speedometer3 TodoMVC-jQuery/Adding100Items/Sync | windows11-64-shippable-qr | fission webrender | 37.76 -> 39.33 | Before/After |
4% | speedometer3 Charts-observable-plot/Stacked by 20/Sync | windows11-64-shippable-qr | fission webrender | 16.32 -> 16.94 | Before/After |
... | ... | ... | ... | ... | ... |
3% | speedometer3 TodoMVC-jQuery/total | windows11-64-nightlyasrelease-qr | fission webrender | 147.78 -> 151.93 | Before/After |
3% | speedometer3 Charts-observable-plot/total | windows11-64-shippable-qr | fission webrender | 54.96 -> 56.44 | Before/After |
3% | speedometer3 Charts-observable-plot/Stacked by 20/total | windows11-64-shippable-qr | fission webrender | 21.67 -> 22.23 | Before/After |
2% | stylebench Before and after pseudo elements/Adding classes - 0/Sync | macosx1015-64-shippable-qr | fission webrender | 0.84 -> 0.86 | Before/After |
2% | speedometer3 Charts-observable-plot/total | windows11-64-nightlyasrelease-qr | fission webrender | 53.97 -> 55.14 | Before/After |
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
6% | speedometer EmberJS-Debug-TodoMVC/CompletingAllItems | linux1804-64-shippable-qr | fission webrender | 242.78 -> 227.35 | Before/After |
6% | speedometer EmberJS-Debug-TodoMVC/CompletingAllItems/Sync | linux1804-64-shippable-qr | fission webrender | 239.50 -> 224.28 | Before/After |
5% | speedometer EmberJS-Debug-TodoMVC/CompletingAllItems | linux1804-64-shippable-qr | fission webrender | 240.96 -> 228.67 | Before/After |
5% | speedometer EmberJS-Debug-TodoMVC | linux1804-64-shippable-qr | fission webrender | 840.38 -> 802.47 | Before/After |
4% | speedometer EmberJS-Debug-TodoMVC/DeletingItems | linux1804-64-shippable-qr | fission webrender | 247.34 -> 236.66 | Before/After |
... | ... | ... | ... | ... | ... |
3% | speedometer EmberJS-Debug-TodoMVC/DeletingItems/Sync | linux1804-64-nightlyasrelease-qr | fission webrender | 244.13 -> 236.13 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 2113
For more information on performance sheriffing please see our FAQ.
Updated•5 months ago
|
Comment 1•5 months ago
|
||
Set release status flags based on info from the regressing bug 1769586
Assignee | ||
Comment 2•5 months ago
|
||
I'm investigating to see if this is indeed the offending patch set and if so what parts of it are so offensive.
I have a couple of compare views going:
-
Revert patch set off of central:
https://perf.compare/compare-results?baseRev=8bff8a5c5f999dd9b728bf21b8eaca5cced84295&newRev=4ed762a2845da7abb629064502b0309302f70815&baseRepo=try&newRepo=try&framework=13 -
Clamp perf comparison to base rev before patchset was introduced to tip of set:
https://perf.compare/compare-results?baseRev=3b480041751b78102a1a644167e7b0ab26e2b026&baseRepo=autoland&newRev=d684010261d6886b7cae29c5f7299f4b7d48afca&newRepo=autoland&framework=13&search=speedometer -
Remove pieces of patch that might be hot paths.
https://perf.compare/compare-results?baseRev=8bff8a5c5f999dd9b728bf21b8eaca5cced84295&newRev=2331ae5b7921c408b600ba41e2959853fb9692dc&baseRepo=try&newRepo=try&framework=13
So far none of these comparisons are showing similar regressions to what alert 2113 shows.
Assignee | ||
Updated•5 months ago
|
Comment 3•4 months ago
|
||
Set release status flags based on info from the regressing bug 1769586
Assignee | ||
Comment 5•4 months ago
|
||
I am. The fact that this is only a regression in a subtest makes it hard to profile. I'm working on this right now.
Updated•4 months ago
|
Comment 6•4 months ago
|
||
(In reply to Eitan Isaacson [:eeejay] from comment #5)
I am. The fact that this is only a regression in a subtest makes it hard to profile. I'm working on this right now.
Hi Eitan, you can also run just one subtest which can help when profiling by using this url: https://browserbench.org/Speedometer3.0?suites=Charts-observable-plot&iterationCount=100
Comment 7•4 months ago
|
||
FYI, bug 1769586 backs out cleanly from Beta if we don't want to ship this regression in 132. Note that we go to RC next week and have 2 betas left this cycle.
Updated•4 months ago
|
Comment 8•4 months ago
|
||
Hey Bas, the regressor is part of our Interop 2024 initiative, however this also appears to impact SP3... any thoughts on what gets priority?
Assignee | ||
Comment 9•4 months ago
|
||
I'm not sure why this helps, but it looks like checking if the
nsTHashMap is empty before iterating speeds things up.
Assignee | ||
Comment 10•4 months ago
|
||
Comment 11•4 months ago
|
||
I collected better profiles on Android where this is also reproducible.
Before: https://share.firefox.dev/48qvKGB
After: https://share.firefox.dev/4f6O21v
I didn't look at the regressing patch very closely, but did the patch add any additional GC objects or change the frequency of a CC or GC at all that may have caused a GC to occur? The major difference between the two profiles seems to come from a GC major slice caused in the bad profile which is not present before the patch: https://share.firefox.dev/4f3FxEq
Comment 12•4 months ago
|
||
Assignee | ||
Comment 13•4 months ago
|
||
That is the same conclusion I came to. I think the Traverse
phase in FragmentOrElement::nsExtendedDOMSlots
might have taken a hit. I queued a patch that I hope speeds things up. The more extreme solution would be to have mAttrElementsMap
on the heap or wrap it in a Maybe so we can skip it entirely when it is not used (99.9999% of the time).
Comment 14•4 months ago
|
||
bugherder |
Comment 15•4 months ago
|
||
Unfortunately it doesn't seem like the attached patch fixed the regression, you can see here that the Dotted-sync test is still around 8-9ms on Windows when it was 7.5ms before the regression.
Comment 16•4 months ago
•
|
||
In the future, I think it would be better to land patches like this in separate bugs so that we can leave the bug open until we are sure the regression is fixed.
Comment 17•4 months ago
|
||
Let's reopen this bug for now. Usually I'm not a fan of having open bugs with landed patches, because it makes regression and uplift tracking harder, but in this case it's unlikely that the landed patch requires this kind of tracking.
Updated•4 months ago
|
Updated•4 months ago
|
Comment 18•4 months ago
|
||
I think we should actually back this out. SP3 is our flagship benchmark, it seems like we don't completely understand the regression and keeping it in Nightly will impact our ability to measure other changes in these numbers.
Assignee | ||
Comment 19•4 months ago
|
||
I'm trying to come up with other solutions to this. Don't fully understand what is happening. I hope to post something today or tomorrow. This patch was is a significant step in our a11y interop-2024, so its worth fighting for.
Comment 20•4 months ago
|
||
Bug 1769586 has been backed out from both affected branches and the patch from this bug was backed out also. The remaining investigation can happen in bug 1769586.
https://hg.mozilla.org/integration/autoland/rev/5351bf4ac8f676c01bdd7f26254650cdc4cde2cd
Comment 21•4 months ago
|
||
I can confirm Denis' suspicion about GC - it does seem that the problem is that bug 1769586 causes us to allocate more GC memory during the test. We then exceed the GC thresholds earlier and run GC slices earlier and more frequently.
Steps to reproduce:
- Go to https://www.browserbench.org/Speedometer3.0/?suite=Charts-observable-plot&iterationCount=100
- Start the Firefox profiler with the "Nightly" preset.
- Click "Start Test"
- When the test is done, capture the profile.
- Go to the marker chart and search for "gcslice,usertiming"
- Using shift+mousewheel, zoom to the first GCSlice marker.
- In the UserTiming rows, check which iteration the first GCSlice ran in.
On my machine, the first GCSlice runs during iteration-18
in builds before bug 1769586, and in iteration-16
in builds after bug 1769586. I can reproduce this very consistently.
Before: https://share.firefox.dev/3BNOZ0h
After: https://share.firefox.dev/4frwXjd
Comment 22•4 months ago
|
||
JS allocation profiling says that, during the first 14 iterations, Document.createElementNS
allocates 28744 bytes on the JS heap before bug 1769586 and 59416 bytes after, which is pretty close to 2x. I wonder if the JS wrapper for DOM elements has doubled in size due to the new properties.
Before: https://share.firefox.dev/3Af4Rsd
After: https://share.firefox.dev/4h75SDr
Assignee | ||
Comment 23•4 months ago
|
||
Thanks Markus,
This makes sense. I was looking for hot paths in the patch. But it looks like the this is possibly due to the new FrozenArray
attributes in the ARIAMixin.webidl
.
Peter, is there a chance that the added attributes are making the js DOM wrappers that much bigger?
Assignee | ||
Comment 24•4 months ago
|
||
For reference, the FrozenArray
support was added in bug 1891784.
Comment 25•4 months ago
|
||
Each attribute marked with [FrozenArray]
requires a reserved slot on the JS object. We went from 1 to 8 slots, so I think that means the size of the JSObject
went from a size of 4 64-bit values to 10 (JSObject_slots2
to JSObject_slots8
). I'll try to make this more lazily allocated, but it's annoying because it means more special cases in the generated code.
Updated•4 months ago
|
Updated•3 months ago
|
Updated•3 months ago
|
Description
•