Closed Bug 1806495 Opened 1 year ago Closed 1 year ago

vsync stops being notified while counting MIDI devices for telemetry

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

VERIFIED FIXED
110 Branch
Performance Impact ?
Tracking Status
firefox-esr102 --- unaffected
firefox108 + fixed
firefox109 + fixed
firefox110 + verified

People

(Reporter: florian, Assigned: bholley)

References

(Regression)

Details

(Keywords: perf:responsiveness, perf:startup, regression)

Attachments

(7 files)

While profiling bug 1789268 to verify the fix, I noticed a pause during the animation. Here is the profile: https://share.firefox.dev/3uZMm4S

There's a 738ms gap during which no refresh driver tick happened on the content process.

When looking at what's happening in the parent process at the same time, there's a long (793ms) "startupLateIdleTask" marker. The main thread seems idle during that time.
At the beginning of that idle startup task, there's a ChromeUtils.importESModule - resource://gre/modules/PrivateBrowsingUtils.sys.mjs marker with this stack:

ChromeUtils::IdleDispatch handler []
_scheduleBestEffortUserIdleTasks/< [resource:///modules/BrowserGlue.sys.mjs:2936:37]
_scheduleBestEffortUserIdleTasks/idleTasks< [resource:///modules/BrowserGlue.sys.mjs:2919:12]
getTopWindow [resource:///modules/BrowserWindowTracker.jsm:164:14]

This points to this code added in bug 1797019.

The part of the profile at the end of the startupLateIdleTask marker looks related to audio devices: https://share.firefox.dev/3FH0qVL

It looks like even though the main thread is not blocked, the operating system stops notifying our process of vsync until we are done counting audio devices. Markus, any idea of why this could happen?

Flags: needinfo?(mstange.moz)

Can you capture a profile that includes the WindowsVsyncThread?

Flags: needinfo?(mstange.moz)

So this telemetry isn't super important and we could of course remove it. Unfortunately, various sites out there silently request MIDI access to fingerprint users, and so to avoid annoying users with prompts, we internally enumerate the devices to determine whether the user has any connected. So even if we remove this telemetry, some users will still hit this jank in the wild during ordinary unrelated browsing.

(unless it's the case that this jank is specifically related to requesting MIDI access from the parent process, in which case we can indeed solve it by removing the telemetry).

(In reply to Markus Stange [:mstange] from comment #1)

Can you capture a profile that includes the WindowsVsyncThread?

I don't know yet if I can reproduce.

(In reply to Markus Stange [:mstange] from comment #1)

Can you capture a profile that includes the WindowsVsyncThread?

This thread seems to only exist on Windows. Here is a profile with all registered threads: https://share.firefox.dev/3YzTLWd

I improved the startup idle task markers to make the profiles easier to read (apply the patch from bug 1806519 if you would like to have these markers locally).

(In reply to Florian Quèze [:florian] from comment #4)

I don't know yet if I can reproduce.

I can actually reproduce most of the time. The "startupLateIdleTask - reportHasMIDIDevices" marker typically takes 700 to 800ms on my machine. I once so it take only 25ms, and that wasn't long enough to make vsync skip a frame.

Performance Impact: --- → ?

Oh, oops, this is on macOS, for some reason I was thinking Windows.

The all-threads profile shows the reason: MIDIPlatformService::Get() is running on the IPDL background thread, blocking it for the duration of the call to MidiClientCreate: https://share.firefox.dev/3WxSA7I

This is not good; long-running or blocking tasks should not run on the PBackground thread, see this now-deleted documentation of PBackground:

The background thread is designed to be responsive (nobody is allowed to do long running computation or file I/O on it) to guarantee better latency

Vsync is delivered via the IPDL background thread.

Set release status flags based on info from the regressing bug 1797019

:nchevobbe, since you are the author of the regressor, bug 1797019, could you take a look? Also, could you set the severity field?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nchevobbe)

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #7)

:nchevobbe, since you are the author of the regressor, bug 1797019, could you take a look? Also, could you set the severity field?

Looks like folks more knowledgeable that me already started looking into this, but let me know if there's anything I can do to help

Flags: needinfo?(nchevobbe)
Assignee: nobody → bholley

My plan here is the following:
(1) At startup, explicitly initialize a StaticRefPtr<TaskQueue> gMIDITaskQueue on top of the background thread pool.
(2) Bind PMIDIManager to that task queue using the same pattern as CreateFileSystemManagerParent.
(3) Update RecvHasMIDIDevice to explicitly proxy over to the task queue.
(4) Update all the assertions to ensure that all the parent MIDI stuff runs on the task queue.

Severity: -- → S3
Priority: -- → P2

The patches here should fix this bug properly, but they're far too involved to backport. As a mitigation, I propose we back bug 1797019 out on beta (or even a dot release if we do one), which will substantially reduce the frequency with which users encounter this stutter. Ryan, WDYT?

Flags: needinfo?(ryanvm)

Fine by me! CCing Dianna as well for the 108 side of things.

Flags: needinfo?(ryanvm)

The bug is marked as tracked for firefox108 (release) and tracked for firefox110 (nightly). However, the bug still has low severity.

:hsinyi, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(htsai)
Severity: S3 → S2
Flags: needinfo?(htsai)
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84ef3b5a18a6
Make PMIDIManager refcounted. r=nika
https://hg.mozilla.org/integration/autoland/rev/ba226492e8fd
Use a helper class to null out the weak reference in MIDIPortChild. r=nika
https://hg.mozilla.org/integration/autoland/rev/fe0bd3672dbd
Make PMIDIPort refcounted. r=nika
https://hg.mozilla.org/integration/autoland/rev/4ba470d9baca
Make PMIDIManager a top-level protocol. r=nika
https://hg.mozilla.org/integration/autoland/rev/132feb5df539
Make PMIDIPort a top-level protocol. r=nika
https://hg.mozilla.org/integration/autoland/rev/4cbb0ea04af1
Stand up a background task queue for the MIDI service. r=nika
https://hg.mozilla.org/integration/autoland/rev/06a0a304a757
Hoist the MIDI service onto a dedicated background task queue. r=nika,gsvelto

Florian, is it easy to confirm the performance issue here is resolved? It should be (we moved it to a different thread), so if it's more than 10 minutes don't bother.

Flags: needinfo?(florian)

(In reply to Bobby Holley (:bholley) from comment #26)

Florian, is it easy to confirm the performance issue here is resolved? It should be (we moved it to a different thread), so if it's more than 10 minutes don't bother.

"startupLateIdleTask - reportHasMIDIDevices" still takes about 800ms, but animations continue during that time: https://share.firefox.dev/3QJLUBV

So yes, FIXED, thanks!

Status: RESOLVED → VERIFIED
Flags: needinfo?(florian)
Regressions: 1811442
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: