Closed Bug 1863491 Opened 11 months ago Closed 10 months ago

Multi second cycle collections on Twitter with LargestContentfulPaint

Categories

(Core :: DOM: Core & HTML, defect, P1)

Firefox 121
defect

Tracking

()

VERIFIED FIXED
121 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox119 --- unaffected
firefox120 --- unaffected
firefox121 + fixed

People

(Reporter: scineram, Assigned: sefeng)

References

(Regression)

Details

(Keywords: perf-alert, regression)

Attachments

(4 files)

Attached file about.support.txt

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0

Steps to reproduce:

Browsing Twitter (any of my accounts in their container).

Actual results:

Within a few minutes on the last few Nightly updates Twitter grinds to a halt. Everything loads glacially, including likeing or retweeting. After scrolling clicking on tweet loads the wrong tweet, presumably the scroll hasn't caught up. Happens after disabling Ublock Origin, even in troubleshooting mode. Other websites however seem to be unaffected.

Expected results:

Should be at least as fast as before. Downgrading to Nightly 2023-10-30-16-49-30 (the only old version I tried) restores the usable performance.

The Bugbug bot thinks this bug should belong to the 'Core::Panning and Zooming' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Panning and Zooming
Product: Firefox → Core
Component: Panning and Zooming → Performance

Does it happen in a fresh profile?

If an older nightly doesn't have the problem you could use mozregression https://mozilla.github.io/mozregression/ to see what caused it?

Or uploading a profile using https://profiler.firefox.com/ could be helpful.

I reported this and uploaded about:support from an office machine once it updated to a new version. But I first encountered the slowdown on my home computer over the weekend. Now I updated the office PC to 2023-11-07 and all the slowdown is back . Ran the profiler over twitter: https://share.firefox.dev/47o1lGZ

Tested more downgrades:

Nightly 2023-11-02-15-48-33 is not broken. Fast, no slowdown on twitter.
Nightly 2023-11-03-09-38-36 is broken. Twitter gets glacial after few minutes of browsing.

Thanks for the profile. Looks like a lot of time in cycle collection.

Flags: needinfo?(continuation)

(In reply to John from comment #4)

Tested more downgrades:

Nightly 2023-11-02-15-48-33 is not broken. Fast, no slowdown on twitter.
Nightly 2023-11-03-09-38-36 is broken. Twitter gets glacial after few minutes of browsing.

Thanks for testing those. That narrows down what caused the problem to this list

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=59754ae1b71386535d42979e577deaaad4aa522f&tochange=63221dffde4f2b6c134c16fb540dea75d856626d

If you are feeling adventures you can use mozregression to fetch autoland builds for revisions of the middle of that to test.

Does switching dom.enable_largest_contentful_paint to false help?

This bug was moved into the Performance component.

:scineram, could you make sure the following information is on this bug?

  • ✅ For slowness or high CPU usage, capture a profile with http://profiler.firefox.com/, upload it and share the link here.
  • For memory usage issues, capture a memory dump from about:memory and attach it to this bug.
  • Troubleshooting information: Go to about:support, click "Copy raw data to clipboard", paste it into a file, save it, and attach the file here.

If the requested information is already in the bug, please confirm it is recent.

Thank you.

Flags: needinfo?(scineram)

It looks like the CC is spending huge amounts of time traversing JS. That usually only happens when a page leaks, but in this case the Twitter process is only using about a dozen MB of memory (except for the CC itself, which is using about 1 GB of memory), so that doesn't make much sense. Maybe the CC CanSkip mechanism is breaking for some other reason.

Flags: needinfo?(continuation)

There was some discussion on Matrix and we thought this might be related to bug 1863104. The regression range matches and if there's a leak caused by those changes it'd explain what we're seeing here.

I can reproduce this. You have to scroll down a bit so there's a decent amount loaded, but not an absurd amount. The multi second CCs persist even if you aren't interacting with Twitter but just letting it sit there.

Summary: twitter grinds to a halt → Multi second cycle collections on Twitter

Setting dom.enable_largest_contentful_paint to false does seem to fix the issue for me, though it is possible I didn't have enough loaded in with the pref set to false.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Multi second cycle collections on Twitter → Multi second cycle collections on Twitter with largest contentful paint
Flags: needinfo?(sefeng)
Summary: Multi second cycle collections on Twitter with largest contentful paint → Multi second cycle collections on Twitter with LargestContentfulPaint

My guess here, without looking at any logs, is that we need to make CanSkip do something about the chain of strong references nsGlobalInnerWindow.mPerformance --> PerformanceMainThread::mLargestContentfulPaintEntries --> LargestContentfulPaint::mElement. If there's an element holds a lot of stuff alive and is not currently in the DOM tree, it could have this kind of issue. (Assuming there's no mechanism that evicts the LCP entries in that case.)

This is easy to reproduce for me so I don't think we need more information from the reporter. Thanks for the report!

Component: Performance → DOM: Core & HTML
Flags: needinfo?(scineram)
Keywords: regression
Regressed by: 1722322

Emilio pointed out that while the element is exposed to JS, GetElement uses nsContentUtils::GetAnElementForTiming(), and that returns null if !IsInComposedDoc() so maybe it would be okay to make LargestContentfulPaint::mElement weak.

Set release status flags based on info from the regressing bug 1722322

[Tracking Requested - why for this release]: Big performance regression on Twitter that is pretty easy to reproduce. I've also seen complaints about at least one other site.

While away I did another profiling of latest version this time on the home machine. https://share.firefox.dev/3Svt5Fq

Also have a memory profile just in case.

Assignee: nobody → sefeng
Status: NEW → ASSIGNED
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2c6fa08aa82
Use weak pointers for LargestContentfulPaint to avoid keeping the element alive for unnecessary period r=emilio,mccr8

Backed out for causing multiple crashes @ nsIGlobalObject::GetAsInnerWindow

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: PROCESS-CRASH | application crashed [@ nsIGlobalObject::GetAsInnerWindow] | toolkit/components/passwordmgr/test/mochitest/test_password_length.html

Log 2: https://treeherder.mozilla.org/logviewer?job_id=435807963&repo=autoland

Log 3: https://treeherder.mozilla.org/logviewer?job_id=435806628&repo=autoland

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/24b44cfac27e
Use weak pointers for LargestContentfulPaint to avoid keeping the element alive for unnecessary period r=emilio,mccr8

Backed out for @ nsIGlobalObject::GetAsInnerWindow related crashes.

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: PROCESS-CRASH | application crashed [@ nsIGlobalObject::GetAsInnerWindow] | toolkit/components/passwordmgr/test/mochitest/test_password_length.html

L.E. Also there are these mochitests.

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cfed378ac68a
Use weak pointers for LargestContentfulPaint to avoid keeping the element alive for unnecessary period r=emilio,mccr8

Backed out for causing build bustages in PerformanceMainThread.cpp.

Severity: -- → S2
Priority: -- → P1
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/abf27349f36e
Use weak pointers for LargestContentfulPaint to avoid keeping the element alive for unnecessary period r=emilio,mccr8
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch
Flags: needinfo?(sefeng)
See Also: → 1863104
Blocks: 1863104
See Also: 1863104
See Also: → 1862909
Duplicate of this bug: 1863604

(In reply to Pulsebot from comment #27)

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/abf27349f36e
Use weak pointers for LargestContentfulPaint to avoid keeping the element
alive for unnecessary period r=emilio,mccr8

== Change summary for alert #40287 (as of Fri, 17 Nov 2023 13:12:11 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
10% matrix-react-bench windows10-64-shippable-qr fission webrender 25.77 -> 23.28 Before/After
8% matrix-react-bench windows10-64-shippable-qr fission webrender 25.24 -> 23.35 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=40287

Flags: qe-verify+

(In reply to Pulsebot from comment #25)

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cfed378ac68a
Use weak pointers for LargestContentfulPaint to avoid keeping the element
alive for unnecessary period r=emilio,mccr8

== Change summary for alert #40390 (as of Fri, 24 Nov 2023 11:19:38 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
10% matrix-react-bench linux1804-64-shippable-qr fission webrender 26.99 -> 24.42 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=40390

John, can you please try and see if the issue is still reproducing on your side?
I tried reproducing on Win10x64 using FF build 121.0a1(20231030164930) from description but it did not reproduce on my machine. Thank you.

Flags: needinfo?(scineram)

I have not seen this problem since the fix rolled out to Nightly.

Status: RESOLVED → VERIFIED
Flags: needinfo?(scineram)

== Change summary for alert #40229 (as of Tue, 14 Nov 2023 06:35:30 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
10% damp browser-toolbox.debugger-ready.DAMP windows10-64-shippable-qr e10s fission stylo webrender-sw 917.95 -> 1,010.57
5% damp browser-toolbox.close-process.DAMP linux1804-64-shippable-qr e10s fission stylo webrender-sw 45.72 -> 48.12
5% damp custom.jsdebugger.pretty-print.DAMP windows10-64-shippable-qr e10s fission stylo webrender 1,553.18 -> 1,626.13
4% damp simple.netmonitor.close.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 6.97 -> 7.23
4% damp custom.jsdebugger.project-search.first-search-result.DAMP linux1804-64-shippable-qr e10s fission stylo webrender-sw 314.04 -> 325.67

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
106% reload-netmonitor:parent-process objects-with-stacks linux1804-64-shippable-qr 10,737.67 -> -660.33
104% reload-netmonitor:parent-process objects-with-stacks linux1804-64-qr 11,000.81 -> -486.67
12% damp custom.netmonitor.close.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 22.46 -> 19.71
11% damp custom.netmonitor.close.DAMP windows10-64-shippable-qr e10s fission stylo webrender-sw 26.20 -> 23.28
7% damp server.protocoljs.DAMP linux1804-64-shippable-qr e10s fission stylo webrender-sw 1,086.86 -> 1,008.21

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=40229

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: