Open Bug 1866145 Opened 1 year ago Updated 4 months ago

Delay opening new windows due to long GC slices in the parent process

Categories

(Core :: JavaScript: GC, defect, P3)

defect

Tracking

()

People

(Reporter: dholbert, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: perf:responsiveness)

I just noticed that opening a new window was quite slow for me, so I repeated the process and captured it in a performance profile:
https://share.firefox.dev/3uoeKk1

If you look in the marker view, you can see there's a TelemetryStopwatch marker for FX_NEW_WINDOW_MS which is 745ms, or 3/4 of a second, which represents the delay that we're hitting when opening that new window.

Most of that delay seems to be occupied by GC which happens synchronously as part of several operations that we do. The most substantial portion there is in the 391ms-long deactivate event which is spent entirely in the JS inferFromText function here, and it's nearly all GC. Maybe that function is creating a lot of js garbage? (Maybe proportional to the number of tabs I have open in the window that was foreground when I pressed Ctrl+N -- that window has quite a lot of tabs, probably on the order of ~1000.)

Summary: one-second delay on opening new windows, mostly spent doing GC → one-second delay on opening new windows, mostly spent doing GC in "inferFromText"

In case it's relevant, it looks like I also have ~80 Firefox windows open (yikes). Most of them don't have many tabs, though. :)

Just mentioning in case the JS-garbage here is actually proportional to the number of open windows rather than (or along with) the number of open tabs.

Hm. I'm not a GC expert by any means, but might it be possible that instead of the inferFromText function creating a bunch of garbage, something else may have created some garbage recently, and the JS engine is taking the opportunity before invoking the inferFromText function to do a GC? Because from what I can tell from this profile (and I might be misinterpreting it), the GC is occurring before inferFromText is actually running.

In that case, I'd be curious to see an about:memory report and the memory track in a profile during one of these situations.

Performance Impact: --- → ?

(In reply to Mike Conley (:mconley) (:⚙️) from comment #2)

In that case, I'd be curious to see an about:memory report and the memory track in a profile during one of these situations.

The memory track at least is there in comment 0's profile (just, not-shown by default). Here's the profile with that track made visible:
https://share.firefox.dev/46qXpUV

That seems to confirm your suspicion that the memory predated inferFromText, rather than being generated during it.

Here's another profile that I generated just now, where GC doesn't seem to be a significant factor, and inferFromText doesn't show up at all:
https://share.firefox.dev/3SVkk7K
In this profile, FX_NEW_WINDOW_MS is 546ms which is better than the one in comment 0 (due to there not being an unfortunately-timed-GC now); but that still feels a bit long.

Component: Toolbars and Customization → General
Summary: one-second delay on opening new windows, mostly spent doing GC in "inferFromText" → Half-second to one-second delay on opening new windows
See Also: → 1867161

Thanks for the bug, Daniel! So this behavior survives restarts? If so, what happens if you restart in safe mode?

Flags: needinfo?(dholbert)

RE does this survive restarts: yes and no. It's a bit sporadic. I'm wondering if this is just me occasionally getting unlucky and having expensive operations coincide with new-window operations? (e.g. the GC in the original comment).

I just captured a perf profile right now (after a recent browser restart) with 4 new-window operations. The first had a FX_NEW_WINDOW_MS of 550ms, and then there were two that were ~240ms and one that was 284ms. Here's that profile: https://share.firefox.dev/47IuOfy

A few things that I notice:
Specifically with today's perf-profile, during the longest FX_NEW_WINDOW_MS operation, looking at the main thread in the parent process:
(1) Task places::NotifyIconObservers occupies 180ms
(2) services-sync (i.e. Firefox Sync operations) occupies 33ms
I suspect those two ~expensive operations (plus maybe some related overhead that's alongside them) are chiefly responsible for the difference between this long new-window operation vs. the shorter ones.

For completeness, here are some other sections that are responsible for about the same amount of time in all of the new-window operations in all of the performance profiles here (the one from comment 0, second one in comment 3, and the one in this comment here):

  • 40ms in L10nReadyHandler::ResolvedCallback
  • 30ms in ExtensionParent.sys.mjs doing recvAPICall
  • 30ms in nsWindow::UpdateMozWindowActive
  • 19ms in mozJSSubScriptLoader::LoadSubScript (38ms in my longest new-window operation from today's profile)
  • 11ms in RestyleManager::ProcessRestyledFrames
  • 10ms in NS_NewXULElement
  • 9ms in gtk_widget_render to paint the new window
  • 4ms in resource:///modules/BrowserUsageTelemetry.sys.mjs (mostly getOpenTabsAndWinsCounts)

These add up to 153ms, and are the main buckets that I could find of identifiable always-performed work in all of these operations. Maybe these (plus a bit more) represent a ~baseline for just-how-long-it-takes?

(Some of these, e.g. the restyle and UpdateMozWindowActive operations, might be variable-cost -- scaling with how many tabs you've got open in the newly-backgrounded window which need to be repainted with an unfocused color, perhaps.)

Flags: needinfo?(dholbert)

A few more notes, comparing to a fresh profile:

  • In a fresh profile on the same machine, my FX_NEW_WINDOW_MS times are 150-180ms. (A bit shorter than the "not unlucky" times from my regular browsing profile, but in the same ballpark; close to 200ms.)

  • Regarding the "always-performed work" noted in previous comment for my regular user-profile, as compared to a fresh profile:

  • 30ms in ExtensionParent.sys.mjs doing recvAPICall

I think this is associated with my Bitwarden add-on, based on there being nearby activity in the webextensions process from Bitwarden.

  • 30ms in nsWindow::UpdateMozWindowActive
  • 11ms in RestyleManager::ProcessRestyledFrames
  • 4ms in resource:///modules/BrowserUsageTelemetry.sys.mjs (mostly getOpenTabsAndWinsCounts)

These ^ buckets barely register (0-2ms) for a new-window in a fresh profile, so these are all probably work that's proportional to the number of open tabs in what-was-previously-the-foreground-window.

Everything else from the list of "always-performed work" is present in about the same duration in a fresh profile.

Bottom-line, I think this bug may not be super actionable; it appears that the key things resulting in >half-second new window operations seem to be other work that happened to be scheduled at the same time, due to bad luck (whether favicon updates, sync operations, or pending GC work that got flushed (possibly GC'ing for the memory increases that I observed in bug 1849393 comment 14).

Beyond those, there's not a particularly massive difference between new-window times for a fresh profile vs. my regular browsing profile (times in the neighborhood of 150-300ms). So unless there's a way (and enough motivation) to somehow preemmpt/prevent any pending unrelated tasks, in the service of making Ctrl+N fast, there's maybe not a lot we can do here (aside from making the background tasks themselves less expensive).

Not sure whether to resolve as INVALID, or reclassify as S4, or something else. I'll defer to mconley/dmose regarding what (if anything) would be useful to do with this bug.

(In reply to Daniel Holbert [:dholbert] from comment #5)

looking at the main thread in the parent process:
(1) Task places::NotifyIconObservers occupies 180ms

I spun off bug 1867842 for this^. (I also noticed that this Task places::NotifyIconObservers is actually slightly longer than 180ms -- it's 216ms -- but 180ms is the portion that overlaps with the FX_NEW_WINDOW_MS marker.)

The severity field is not set for this bug.
:mossop, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(dtownsend)

The specific cause here might not be something that is actionable but I do wonder if it might make sense to be able to defer background things like GC during operations that we think are important for user experience, like opening new windows, new tabs, etc. Obviously if we're running out of memory we may have no choice and perhaps it isn't trivial to know the difference.

Severity: -- → S3
Flags: needinfo?(dtownsend)
Summary: Half-second to one-second delay on opening new windows → Background operations may drastically slow down opening new windows

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript: GC' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → JavaScript: GC
Product: Firefox → Core
Flags: needinfo?(jcoppeard)

Renaming the bug, per comment 10.

Performance Impact: ? → pending-needinfo
Summary: Background operations may drastically slow down opening new windows → Consider deferring GCs when not under memory pressure to idle moments

Normally GC happens incrementally in slices that we try to run in idle time.

From the profile there are a few GC related things going on here:

  1. we're running very long slices (100ms)
  2. we're triggering non-incremental COMPARTMENT_REVIVED GCs that take a long time
  3. GC is taking a long time relative to the heap size

It's also worth pointing out that this problem can also occur when there is little GC activity as per comment 3.

  1. I don't know why we're triggering such long slices. The budget is calculated in CCGCScheduler::ComputeInterSliceGCBudget so I guess it must be coming from there. There is provision there to increase the budget if the CC has been blocked from running, but I don't see how this produces 100ms budgets with the default settings. Steve do you know what's going on here?

  2. 'Compartment revived' GCs are there to make sure we clean up zones that are actually dead but were accidentally held alive by barriers or transplants or some other reason. They are non-incremental and this can be problematic. We can investigate whether these are being triggered appropriately and whether they do in fact result in cleaning up dead zones.

  3. These GCs seem to be taking a long time relative to the size of the heap. This is obviously a really inexact comparison, but locally I see GCs of 35ms for a 100MB heap whereas here we're getting 500ms for a 300MB heap.

Flags: needinfo?(jcoppeard)
Summary: Consider deferring GCs when not under memory pressure to idle moments → Delay opening new windows due to long GC slices in the parent process
Priority: -- → P3

(In reply to Jon Coppeard (:jonco) from comment #13)
Do you know how CCGCScheduler::ComputeInterSliceGCBudget ends up generating 100ms slice budgets here?

Flags: needinfo?(sphink)

(In reply to Jon Coppeard (:jonco) from comment #14)

(In reply to Jon Coppeard (:jonco) from comment #13)
Do you know how CCGCScheduler::ComputeInterSliceGCBudget ends up generating 100ms slice budgets here?

Assuming dholbert hasn't modified javascript.options.mem.gc_incremental_slice_ms, this is an idle-time GC where the aDeadline passed in is 100ms in the future. It is passed in from the bowels of the idle scheduling code.

As a result of looking at this, I notice that these idle slices are not interruptible (because they are long, and therefore the scheduler code infers that the GC is running too long and the slices are being lengthened). Filed bug 1896037.

Flags: needinfo?(sphink)

Do you know how CCGCScheduler::ComputeInterSliceGCBudget ends up generating 100ms slice budgets here?

Assuming dholbert hasn't modified javascript.options.mem.gc_incremental_slice_ms, this is an idle-time GC where the aDeadline passed in is 100ms in the future. It is passed in from the bowels of the idle scheduling code.

Er, wait. It looks like the profiler markers annotate whether a slice is idle or not. So by that, these are not idle slices and my logic is wrong. dholbert, do/did you have any javascript.options.mem prefs set to nondefault values?

Flags: needinfo?(dholbert)

(In reply to Steve Fink [:sfink] [:s:] from comment #16)

dholbert, do/did you have any javascript.options.mem prefs set to nondefault values?

I don't currently have any nondefault values for those, no (and I don't recall ever manually modifying/resetting those prefs).

Flags: needinfo?(dholbert)

FWIW since the existing profiles here are many months old, I just recorded another profile of me opening a new window, where it happened to be slow when I was profiling (yay!):
https://share.firefox.dev/3wlhxfs

There's a FX_NEW_WINDOW_MS marker that's 3.7s long, much worse than the one in comment 0 here.

The memory track shows a long GC, but we seem to be idle for much of the time (pthread_cond_wait), so maybe we're not really actively GC'ing?

I do see a slow 127.0.0.1 network load associated with some BrowserStack tab that I have open; those seem to coincide with my long new-window marker. I'm not sure if those loads are the-things-being-slow vs. if they're just also blocked on whatever else was holding things up.

(In reply to Daniel Holbert [:dholbert] from comment #18)

The memory track shows a long GC, but we seem to be idle for much of the time (pthread_cond_wait), so maybe we're not really actively GC'ing?

Yes, that's a 5.3 GCMajor marker, but that's just the time from start to finish; it just means we're in "incremental GC mode", but the browser is running mostly full speed during that time. It spent 324ms doing actual GC work within that time range, and the longest single pause was 100ms. The pthread_cond_wait isn't during actual GC time.

324ms is quite a bit, but it kind of disappears in the disaster that is that profile. Someone who understands what is going on should look at this. Given the way pthread_cond_wait is being treated as active time instead of idle time (which might be valid since it seems like things might be waiting for it?), I'm going to tag Markus for now. Especially since he may know what's up with the slowness too.

As for the current title of this bug, that's sort of happening already now. Most of our GC work happens in idle time. Mainthread GC work during non-idle time now means one or more of (1) things are so busy there's no idle time, (2) things are active enough that the GC falls behind, or (3) things are allocating fast enough that we're running out of memory.

I'd still like to understand where the 100ms is coming from. I think it's "normal", but I need to understand why. (When I profile my current browser, I don't see any of those 100ms budgets. It's mostly interruptible 5ms slices, or sometimes up to 20ms or so. Or 49ms idle slices.)

Flags: needinfo?(mstange.moz)

(In reply to Steve Fink [:sfink] [:s:] from comment #20)

324ms is quite a bit, but it kind of disappears in the disaster that is that profile

I agree, this profile deserves its own metabug. It looks very different from the profile that this bug was filed about.

Given the way pthread_cond_wait is being treated as active time instead of idle time (which might be valid since it seems like things might be waiting for it?)

The time in pthread_cond_wait definitely contributes to the jank because we're blocking in a synchronous IPC message (PWebRenderBridge::Msg_EnsureConnected).

Especially since he may know what's up with the slowness too.

  • Lots of windows
  • 30ms compositing GPU time per window
  • The profiler being started causes paints in all windows to turn the toolbar button blue
  • Some really bad behavior with occlusion detection - we seem to be recreating EGL surfaces for all windows, twice
  • Occlusion detection causing synchronous composites via "SendResume"
  • Session restore writing big blobs of JSON
  • Lots of SidebarItemChanged notifications
Flags: needinfo?(mstange.moz)

Based on comment 21 seems that this should be broken down into a Meta bug and likely moved to Core:Performance. @mstange / @dholbert would you mind creating the breakout bugs. Once we have the individual bugs within a meta we can individually determine the impact. I will remove the impact field from this bug.

Performance Impact: pending-needinfo → ---
Flags: needinfo?(mstange.moz)
Flags: needinfo?(dholbert)
You need to log in before you can comment on or make changes to this bug.