Closed Bug 1920082 Opened 5 months ago Closed 4 months ago

12.12 - 2.16% speedometer3 Charts-observable-plot/Dotted/Sync / speedometer3 Charts-observable-plot/total + 20 more (OSX, Windows) regression on Mon September 16 2024

Categories

(Core :: Disability Access APIs, defect, P1)

defect

Tracking

()

RESOLVED FIXED
133 Branch
Performance Impact ?
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox130 --- unaffected
firefox131 --- unaffected
firefox132 --- fixed
firefox133 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: eeejay)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [sp3])

Attachments

(1 file)

Perfherder has detected a browsertime performance regression from push d684010261d6886b7cae29c5f7299f4b7d48afca. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
12% speedometer3 Charts-observable-plot/Dotted/Sync windows11-64-shippable-qr fission webrender 7.55 -> 8.46 Before/After
10% speedometer3 Charts-observable-plot/Dotted/Sync windows11-64-nightlyasrelease-qr fission webrender 7.35 -> 8.10 Before/After
7% speedometer3 Charts-observable-plot/Dotted/Sync macosx1400-64-shippable-qr fission webrender 4.86 -> 5.19 Before/After
6% speedometer3 Charts-observable-plot/Dotted/total windows11-64-shippable-qr fission webrender 12.55 -> 13.34 Before/After
6% speedometer3 Charts-observable-plot/Dotted/Sync macosx1015-64-shippable-qr fission webrender 11.07 -> 11.71 Before/After
6% speedometer3 Charts-observable-plot/Dotted/total windows11-64-nightlyasrelease-qr fission webrender 12.20 -> 12.87 Before/After
5% speedometer3 Charts-observable-plot/Dotted/total macosx1400-64-shippable-qr fission webrender 7.65 -> 8.01 Before/After
5% stylebench Nth pseudo classes/Adding leaf elements - 4/Sync windows11-64-shippable-qr fission webrender 1.32 -> 1.38 Before/After
4% speedometer3 TodoMVC-jQuery/Adding100Items/Sync windows11-64-shippable-qr fission webrender 37.76 -> 39.33 Before/After
4% speedometer3 Charts-observable-plot/Stacked by 20/Sync windows11-64-shippable-qr fission webrender 16.32 -> 16.94 Before/After
... ... ... ... ... ...
3% speedometer3 TodoMVC-jQuery/total windows11-64-nightlyasrelease-qr fission webrender 147.78 -> 151.93 Before/After
3% speedometer3 Charts-observable-plot/total windows11-64-shippable-qr fission webrender 54.96 -> 56.44 Before/After
3% speedometer3 Charts-observable-plot/Stacked by 20/total windows11-64-shippable-qr fission webrender 21.67 -> 22.23 Before/After
2% stylebench Before and after pseudo elements/Adding classes - 0/Sync macosx1015-64-shippable-qr fission webrender 0.84 -> 0.86 Before/After
2% speedometer3 Charts-observable-plot/total windows11-64-nightlyasrelease-qr fission webrender 53.97 -> 55.14 Before/After

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
6% speedometer EmberJS-Debug-TodoMVC/CompletingAllItems linux1804-64-shippable-qr fission webrender 242.78 -> 227.35 Before/After
6% speedometer EmberJS-Debug-TodoMVC/CompletingAllItems/Sync linux1804-64-shippable-qr fission webrender 239.50 -> 224.28 Before/After
5% speedometer EmberJS-Debug-TodoMVC/CompletingAllItems linux1804-64-shippable-qr fission webrender 240.96 -> 228.67 Before/After
5% speedometer EmberJS-Debug-TodoMVC linux1804-64-shippable-qr fission webrender 840.38 -> 802.47 Before/After
4% speedometer EmberJS-Debug-TodoMVC/DeletingItems linux1804-64-shippable-qr fission webrender 247.34 -> 236.66 Before/After
... ... ... ... ... ...
3% speedometer EmberJS-Debug-TodoMVC/DeletingItems/Sync linux1804-64-nightlyasrelease-qr fission webrender 244.13 -> 236.13 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 2113

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(eitan)

Set release status flags based on info from the regressing bug 1769586

I'm investigating to see if this is indeed the offending patch set and if so what parts of it are so offensive.

I have a couple of compare views going:

  1. Revert patch set off of central:
    https://perf.compare/compare-results?baseRev=8bff8a5c5f999dd9b728bf21b8eaca5cced84295&newRev=4ed762a2845da7abb629064502b0309302f70815&baseRepo=try&newRepo=try&framework=13

  2. Clamp perf comparison to base rev before patchset was introduced to tip of set:
    https://perf.compare/compare-results?baseRev=3b480041751b78102a1a644167e7b0ab26e2b026&baseRepo=autoland&newRev=d684010261d6886b7cae29c5f7299f4b7d48afca&newRepo=autoland&framework=13&search=speedometer

  3. Remove pieces of patch that might be hot paths.
    https://perf.compare/compare-results?baseRev=8bff8a5c5f999dd9b728bf21b8eaca5cced84295&newRev=2331ae5b7921c408b600ba41e2959853fb9692dc&baseRepo=try&newRepo=try&framework=13

So far none of these comparisons are showing similar regressions to what alert 2113 shows.

Flags: needinfo?(eitan)
Severity: -- → S3
Priority: -- → P1

Set release status flags based on info from the regressing bug 1769586

Are you still looking into this, Eitan?

Flags: needinfo?(eitan)

I am. The fact that this is only a regression in a subtest makes it hard to profile. I'm working on this right now.

Flags: needinfo?(eitan)
Assignee: nobody → eitan

(In reply to Eitan Isaacson [:eeejay] from comment #5)

I am. The fact that this is only a regression in a subtest makes it hard to profile. I'm working on this right now.

Hi Eitan, you can also run just one subtest which can help when profiling by using this url: https://browserbench.org/Speedometer3.0?suites=Charts-observable-plot&iterationCount=100

FYI, bug 1769586 backs out cleanly from Beta if we don't want to ship this regression in 132. Note that we go to RC next week and have 2 betas left this cycle.

Performance Impact: --- → ?

Hey Bas, the regressor is part of our Interop 2024 initiative, however this also appears to impact SP3... any thoughts on what gets priority?

Flags: needinfo?(bas)

I'm not sure why this helps, but it looks like checking if the
nsTHashMap is empty before iterating speeds things up.

I collected better profiles on Android where this is also reproducible.

Before: https://share.firefox.dev/48qvKGB
After: https://share.firefox.dev/4f6O21v

I didn't look at the regressing patch very closely, but did the patch add any additional GC objects or change the frequency of a CC or GC at all that may have caused a GC to occur? The major difference between the two profiles seems to come from a GC major slice caused in the bad profile which is not present before the patch: https://share.firefox.dev/4f3FxEq

Pushed by eisaacson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a93b274a0bcd Check if mAttrElementsMap before CC traversing it. r=dom-core,jjaschke

That is the same conclusion I came to. I think the Traverse phase in FragmentOrElement::nsExtendedDOMSlots might have taken a hit. I queued a patch that I hope speeds things up. The more extreme solution would be to have mAttrElementsMap on the heap or wrap it in a Maybe so we can skip it entirely when it is not used (99.9999% of the time).

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch

Unfortunately it doesn't seem like the attached patch fixed the regression, you can see here that the Dotted-sync test is still around 8-9ms on Windows when it was 7.5ms before the regression.

In the future, I think it would be better to land patches like this in separate bugs so that we can leave the bug open until we are sure the regression is fixed.

Let's reopen this bug for now. Usually I'm not a fan of having open bugs with landed patches, because it makes regression and uplift tracking harder, but in this case it's unlikely that the landed patch requires this kind of tracking.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 133 Branch → ---
Flags: needinfo?(bas)

I think we should actually back this out. SP3 is our flagship benchmark, it seems like we don't completely understand the regression and keeping it in Nightly will impact our ability to measure other changes in these numbers.

I'm trying to come up with other solutions to this. Don't fully understand what is happening. I hope to post something today or tomorrow. This patch was is a significant step in our a11y interop-2024, so its worth fighting for.

Bug 1769586 has been backed out from both affected branches and the patch from this bug was backed out also. The remaining investigation can happen in bug 1769586.
https://hg.mozilla.org/integration/autoland/rev/5351bf4ac8f676c01bdd7f26254650cdc4cde2cd

Status: REOPENED → RESOLVED
Closed: 4 months ago4 months ago
Resolution: --- → FIXED

I can confirm Denis' suspicion about GC - it does seem that the problem is that bug 1769586 causes us to allocate more GC memory during the test. We then exceed the GC thresholds earlier and run GC slices earlier and more frequently.

Steps to reproduce:

  1. Go to https://www.browserbench.org/Speedometer3.0/?suite=Charts-observable-plot&iterationCount=100
  2. Start the Firefox profiler with the "Nightly" preset.
  3. Click "Start Test"
  4. When the test is done, capture the profile.
  5. Go to the marker chart and search for "gcslice,usertiming"
  6. Using shift+mousewheel, zoom to the first GCSlice marker.
  7. In the UserTiming rows, check which iteration the first GCSlice ran in.

On my machine, the first GCSlice runs during iteration-18 in builds before bug 1769586, and in iteration-16 in builds after bug 1769586. I can reproduce this very consistently.
Before: https://share.firefox.dev/3BNOZ0h
After: https://share.firefox.dev/4frwXjd

JS allocation profiling says that, during the first 14 iterations, Document.createElementNS allocates 28744 bytes on the JS heap before bug 1769586 and 59416 bytes after, which is pretty close to 2x. I wonder if the JS wrapper for DOM elements has doubled in size due to the new properties.
Before: https://share.firefox.dev/3Af4Rsd
After: https://share.firefox.dev/4h75SDr

Thanks Markus,
This makes sense. I was looking for hot paths in the patch. But it looks like the this is possibly due to the new FrozenArray attributes in the ARIAMixin.webidl.

Peter, is there a chance that the added attributes are making the js DOM wrappers that much bigger?

Flags: needinfo?(peterv)

For reference, the FrozenArray support was added in bug 1891784.

Each attribute marked with [FrozenArray] requires a reserved slot on the JS object. We went from 1 to 8 slots, so I think that means the size of the JSObject went from a size of 4 64-bit values to 10 (JSObject_slots2 to JSObject_slots8). I'll try to make this more lazily allocated, but it's annoying because it means more special cases in the generated code.

Flags: needinfo?(peterv)
See Also: → 1926226
Target Milestone: --- → 133 Branch
Whiteboard: [sp3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: