Closed Bug 1815151 Opened 1 year ago Closed 1 year ago

Slow tab closes

Categories

(Firefox :: Sync, defect)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Performance Impact medium
Tracking Status
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- fixed

People

(Reporter: yoasif, Assigned: skhamis)

References

Details

(Whiteboard: [fxsync-])

Attachments

(2 files)

​​### Basic information

Been noticing this for a little while now, and it is hard to track down what is going on. Disabled a bunch of extensions to see if I could resolve it, but seeing slow tab closes.

Steps to Reproduce:

  1. Open large restored session.
  2. Close tab via ctrl-w

Expected Results:

Next tab opens quickly and loads.

Actual Results:

This seems to stutter and jank.


Performance recording (profile)

Profile URL: https://share.firefox.dev/3ldqnWz

System configuration:

OS version: Fedora 37
GPU model: Mesa Intel® UHD Graphics (TGL GT2)
Number of cores: 11th Gen Intel® Core™ i3-1125G4 × 8
Amount of memory (RAM): 64gb

More information

Please consider attaching the following information after filing this bug, if relevant:

  • Screenshot / screen recording
  • Anonymized about:memory dump, for issues with memory usage
  • Troubleshooting information: Go to about:support, click "Copy text to clipboard", paste it to a file, save it, and attach the file here.

Thanks so much for your help.

Attached file about:support

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on browser: Causes noticeable jank

This appears to be spending most of its time in something called 'TryFitItems', Mike might have an idea of what to do here.

Performance Impact: --- → medium
Component: Performance → General
Flags: needinfo?(mconley)
Product: Core → Firefox

So there are quite a few things going on in this profile. The main thread in the parent process is janking several times, but if we want to focus in on the tab close / tab switch, we can do that.

This is a slice of the original profile that spans from the beginning of the cmd_close command to close the tab to the actual tab switch of that tab close. From start to finish, this takes 3.1 seconds, which is pretty brutal.

The first chunk of that time, the parent process is spinning a nested event loop waiting to hear back from the content process. That wait is 1.1s. Unfortunately, I can't see any samples from the Reddit page that's taking so long to respond, but presumably it's main thread is blocked and can't service the PermitUnload request fast enough.

A significant chunk of time (about 826ms) is spent preparing to close the tab. There's nothing super expensive going on here, so it looks more like the thread just isn't getting serviced - I see a bunch of dropped samples in that region.

Then there are some long style and layout flushes for updating the tab strip before the new tab is foregrounded, ending the tab close operation.

So my conclusion is that this machine was highly encumbered when this profile was gathered.

I wonder if there's more we can do to prioritize operations in the foreground window? From what I can tell, there were 6 windows open at the time that this profile was gathered. Asif, do you recall if all windows were visible when you dumped the profile? Or were some overlapping / occluding each other?

Flags: needinfo?(mconley) → needinfo?(yoasif)

I wonder if there's more we can do to prioritize operations in the foreground window? From what I can tell, there were 6 windows open at the time that this profile was gathered. Asif, do you recall if all windows were visible when you dumped the profile? Or were some overlapping / occluding each other?

Mike, I would think that all of at most, all but one or two windows would be occluded - I have one display on this machine and windows are either maximized or there is at most one window in the foreground that is not maximized, with another window maximized (occluding the rest).

Flags: needinfo?(yoasif)

:mconley, we're doing firefox::general triage and I'm not clear what the best next steps should be here. Can you suggest?

Flags: needinfo?(mconley)

I think the biggest thing we've identified (rather, Bas identified) is that some of our Sync client code is taking a long time to do things on the main thread: https://share.firefox.dev/407LKYA

So let's move this to Sync for now and see what the gang over there thinks.

Component: General → Sync
Flags: needinfo?(mconley)

Passing by, looks like the code that's taking a while is in tryFitItems in https://searchfox.org/mozilla-central/source/services/sync/modules/util.sys.mjs#422

Which tries to figure out the largest number of tabs we can sync without hitting the limit of the Sync server. Looks like we're taking a long time doing the calculation because it computes the size (by serializing the tabs to JSON) after removing each individual tab if it exceeds the maximum (which would take a long time if a user has a very large number of tabs)

It might an opportunity to improve this code a bit!

:skhamis you implemented these bits in the Rust client quite recently, but the code that's taking a while is in JS, do you know if that path still makes sense now that we use the Rust client?

(and maybe any improvements we make here we can make back in the Rust engine)

Flags: needinfo?(skhamis)

FWIW I'm sure we can do some smart binary search to figure out the right spot to cut the payload, but I'm not sure if we'd be over-optimizing and we don't need it :)

Yes we also implemented tryFitItems in Rust, so it's possible we can simply remove that call all-together. However something is a bit off as tryFitItems has been used for quite some time (even way before we added the Rust side of it). I'll dive in a little to ensure this is the actual problem area and removing tryFitItems is the sole culprit (from the sync perspective).

Still researching this but wanted to do a little bit of an update:

My perf link: https://share.firefox.dev/3lzqVGJ

Steps I took:

  1. Open 4,000 tabs (tested using data: schema to not worry about network hits)
  2. Lowered the "max payload size" to 0.5mb so
  3. Force closed my browser
  4. Restored session to open 4k tabs
  5. Start closing a bunch of tabs

While I'm not quite seeing the same performance hit, I'll be trying to get my browser to hit 8k tabs (which is about around the server limit, using my "test" urls). The culprit though seems to be https://searchfox.org/mozilla-central/source/services/sync/modules/util.sys.mjs#422-423 will take really long with a large amount of tabs.

Asif, do you happen to know how many tabs this profile has? If it's around the ballpark of what I did above then I believe I'm on the right track, if not there there might another culprit.

Flags: needinfo?(yoasif)

(In reply to Tarik Eshaq from comment #8)

FWIW I'm sure we can do some smart binary search to figure out the right spot to cut the payload, but I'm not sure if we'd be over-optimizing and we don't need it :)

We already try to do a "best guess" on where we need to cut off the records: https://searchfox.org/mozilla-central/source/services/sync/modules/util.sys.mjs#438. In my testing, it's quite close (give or take a couple of tabs) to where we need to be. So worst case scenarios is in the single-digits! However, trying to get this async will probably help in those scenarios.

Assignee: nobody → skhamis
Status: NEW → ASSIGNED

Sammy, while I don't have a backup of my profile (sorry!) from this time period, 8,000 tabs would not be unexpected - it is entirely possible that it is above 10k tabs.

Flags: needinfo?(yoasif)
Flags: needinfo?(skhamis)
Whiteboard: [fxsync-]
Pushed by skhamis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6b87ed25b070
Optimize tryFitItems to better handle larger payloads r=markh,mconley
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

Since nightly and release are affected, beta will likely be affected too.
For more information, please visit auto_nag documentation.

The patch landed in nightly and beta is affected.
:skhamis, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox112 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(skhamis)
Flags: needinfo?(skhamis)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: