Closed Bug 1640852 Opened 4 years ago Closed 3 years ago

Window.getWorkspaceID blocks the main thread on Windows

Categories

(Firefox :: Session Restore, defect, P1)

defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox76 --- unaffected
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- fixed

People

(Reporter: florian, Assigned: bas.schouten)

References

(Regression)

Details

(Keywords: perf, regression, Whiteboard: [fxperf:p2][bhr:nsWindow::GetWorkspaceID])

Attachments

(1 file)

See this profile: https://share.firefox.dev/2ZzHl4B
This is captured on the 2018 quantum reference hardware (ie. a slow dell laptop).

Set release status flags based on info from the regressing bug 890125

This looks like it can happen pretty close to startup too, at least according to this profile, so setting it to fxperf:p2.

Whiteboard: [fxperf] → [fxperf:p2]

The severity field is not set for this bug.
:mikedeboer, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mdeboer)

Even on my fast machine this is causing 30-50ms of jank per window whenever we save the session, when the machine is under some load (see dupe).

(In reply to Mike Conley (:mconley) (:⚙️) from comment #2)

This looks like it can happen pretty close to startup too, at least according to this profile, so setting it to fxperf:p2.

Indeed, it can happen before the end of startup. In this profile https://share.firefox.dev/3oBBfe3 it janked the main thread for 1.4s before about:home had been painted.

Please feel free to disable this feature on Windows, if this is bad enough.

Flags: needinfo?(mdeboer)

(In reply to Mike de Boer [:mikedeboer] from comment #7)

Please feel free to disable this feature on Windows, if this is bad enough.

I think the functional regression for people who use workspaces would be pretty bad; the issue is that the perf regression is there for everyone (even the vast majority who don't use workspaces at all). The problem is that there's no real way to distinguish the groups; Windows doesn't provide any APIs for the purpose...

I asked about this on slack and :mhowell had some good ideas, the main ones being:

  1. cache the COM object we use for the checks, ie the IVirtualDesktopManager instead of creating it for each call. (This seems to be like 60-80% of the cost of running this method)
  2. make the GetWindowDesktopId call async and run it off-main-thread.

Florian, do you have cycles to take this?

Flags: needinfo?(florian)

(In reply to :Gijs (he/him) from comment #8)

Florian, do you have cycles to take this?

Unlikely, and I'm also probably not the best person to tackle this (I know very little about Windows specific code). I think Windows integration code is covered by gcp's team; forwarding the needinfo.

Flags: needinfo?(florian) → needinfo?(gpascutto)

We either have no cycles or we can put 3 people on this 24/7 (ok maybe not quite...), the only question is how important this is. The bug has been open for 5 months and has no priority/severity, so from my perspective this doesn't look more important then the work we have planned until H1 2021.

If this is incorrect, please explain the impact and tag the bug appropriately, we'll re-prioritize accordingly.

Flags: needinfo?(gpascutto) → needinfo?(mdeboer)
Severity: -- → S2
Flags: needinfo?(mdeboer)
Priority: -- → P1

(For a perhaps more appropriate assessment of the severity here, I leave to :Gijs and/ or :Florian)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)

please explain the impact

This bug is reponsible for about:

  • 5 to 10% of the total startup time (time from when the parent process is created to when about:home has been painted). I have these numbers from startup profiles (eg. in this profile from a recent Nightly, it's 1.1s, out of a 15s startup, so a bit more than 7% of the time: https://share.firefox.dev/350900T)
  • about 1% of the reported hang time (we have this data from BHR, ie. Telemetry).
Whiteboard: [fxperf:p2] → [fxperf:p2][bhr:nsWindow::GetWorkspaceID]
See Also: → 1675724

Looking at this a bit more, I've found another duplicate (bug 1669357) that contains an interesting profile showing how this gets in the way while browsing: https://share.firefox.dev/3p5s2e2
This duplicate bug links to bug 1645473 that has some ideas about what we should do, and that seem to go in the same direction as comment 8 here, but more detailed.

While looking for a stack frame I could use as a BHR signature to put in the whiteboard, I've also noticed that we have a similar issue on Mac, and filed bug 1675724.

See Also: → 1645473

10% of the total startup time

Sold.

Assignee: nobody → cmartin

(In reply to Gian-Carlo Pascutto [:gcp] from comment #15)

10% of the total startup time

Sold.

❤️

No longer blocks: 1664067
Assignee: cmartin → bas
Status: NEW → ASSIGNED

Created a simple (fairly hacky) patch to address this issue, it both cashes the VirtualDesktopManager and constructs it off the main thread to take it off the critical path for startup. It seems to work fine for me locally. In case anyone wants to try for themselves, it's building on try: https://treeherder.mozilla.org/jobs?repo=try&revision=b15535655f3f5d31b2e73753b155f692cb90b23b

Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/72258160ccad
Grab IVirtualDesktopManager object off the main thread on the creation of the first nsWindow. r=mikedeboer,aklotz
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
Regressions: 1708480
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: