Sluggish scrolling if the gecko event loop is busy (native scroll events no longer get coalesced properly by macOS)
Categories
(Core :: Widget: Cocoa, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | unaffected |
firefox92 | --- | unaffected |
firefox93 | --- | unaffected |
firefox94 | + | wontfix |
firefox95 | --- | fixed |
People
(Reporter: jrmuizel, Assigned: jrmuizel)
References
(Regression)
Details
(Keywords: regression)
Attachments
(4 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta-
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta-
|
Details | Review |
Check out the "APZ payload presented markers" on the compositor thread https://share.firefox.dev/3AlvLc3
Comment 1•3 years ago
|
||
Most of that latency comes from the "native event time stamp -> main thread sees native event" part of the pipeline. You can see high latency on the DOMEvent markers on the main thread: https://share.firefox.dev/3Dnfrcy
So then the question is, what kind of load are we producing so that the system can't deliver events to us fast enough? Or is there a problem with the event loop?
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
I regression windowed this and bug 1690687 is to blame. It seems like we're probably starving the native event loop.
[Tracking Requested - why for this release]:
This regression can make Firefox noticeably less responsive.
Comment 3•3 years ago
|
||
Set release status flags based on info from the regressing bug 1690687
Comment 4•3 years ago
•
|
||
Steps to reproduce:
- In a fresh profile, enable devtools.chrome.enabled.
- Open the browser console.
- Paste the following and press enter:
var geckoEventLoopClogger = (function () {
let messageName = 'ping-pong';
let workerJs = `
onmessage = e => {
postMessage(e.data);
};
`;
let blobUrl = URL.createObjectURL(new Blob([workerJs]));
let worker;
function handleMessage(event) {
let startTime = performance.now();
while (performance.now() - startTime < 14) {}
worker.postMessage(event.data);
}
function start() {
if (worker) {
stop();
}
worker = new Worker(blobUrl);
worker.addEventListener('message', handleMessage);
worker.postMessage('ping-pong');
}
function stop() {
if (!worker) {
return;
}
worker.removeEventListener('message', handleMessage);
worker.terminate();
worker = undefined;
}
return { start, stop };
})();
- Type
geckoEventLoopClogger.start()
and press enter. - In the main browser window, scroll with the touchpad or the magic mouse, up and down, continuously.
- On the browser console, execute
geckoEventLoopClogger.stop()
Expected results:
Scrolling should be responsive and not lag very far behind.
Actual results:
Scrolling feels very sluggish and falls behind more and more.
Profile before bug 1690687 (build 2021-09-05): https://share.firefox.dev/3AoLWp3
Profile after bug 1690687 (build 2021-09-08): https://share.firefox.dev/2WZWWvu
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
I believe this is caused by AcquireFirstMatchingEventInQueue
breaking CGEvent coalescing. I think we were previously often pulling out the dummy event, which may have prevented AcquireFirstMatchingEventInQueue
from interfering with the coalescing.
Assignee | ||
Updated•3 years ago
|
Comment 6•3 years ago
|
||
I'm not going to be able to work on this anytime soon.
Assignee | ||
Comment 7•3 years ago
|
||
Ah, I wasn't expecting you to. Just thinking you might remember some things from bug 996848.
Here's the patch that seems to avoid the problem:
https://treeherder.mozilla.org/jobs?repo=try&revision=994d02d0eff88ee8003ceaa8aefd1ba9da3684f8&selectedTaskRun=civiSeINTYWqch7VCyyXMg.0
It just skips all of the calls to AcquireFirstMatchingEventInQueue
. Steven, do you recall why they're needed?
Comment 8•3 years ago
•
|
||
The code you're changing is in bool nsAppShell::ProcessNextNativeEvent(bool aMayWait)
. The change you have in mind basically chops off the second part of the if (aMayWait)
conditional. With this change, nsAppShell::ProcessNextNativeEvent()
will never process a native event while aMayWait
is false
.
I don't know whether this will make trouble. It might even be an improvement. But I have a suggestion that's less radical and might still solve your problem:
diff --git a/widget/cocoa/nsAppShell.mm b/widget/cocoa/nsAppShell.mm
--- a/widget/cocoa/nsAppShell.mm
+++ b/widget/cocoa/nsAppShell.mm
@@ -693,7 +693,7 @@ bool nsAppShell::ProcessNextNativeEvent(
EventRef currentEvent =
AcquireFirstMatchingEventInQueue(currentEventQueue, 0, NULL, kEventQueueOptionsNone);
if (!currentEvent) {
- continue;
+ break;
}
EventAttributes attrs = GetEventAttributes(currentEvent);
UInt32 eventKind = GetEventKind(currentEvent);
Your problem isn't caused by calling AcquireFirstMatchingEventInQueue()
directly -- there's another call (below) that your patch doesn't remove. The real question is why I avoided calling [NSApp nextEventMatchingMask:...]
and [NSApp sendEvent:]
when aMayWait
is false. I no longer have a clear memory of what I was thinking when I wrote my patch for bug 996848. But I left copious notes there and in my source code. As best I can tell, the best overall summary is in this code comment.
Comment 9•3 years ago
|
||
Where (in macOS system code) does "CGEvent coalescing" happen?
I now have much better debugging tools than I did when I wrote my patch for bug 996848 (HookCase and the techniques I've developed for using it). At some point I'd like to dig into this code, and see what might interfere with it (for example not spinning the native event loop when aMayWait
is false).
Comment 10•3 years ago
|
||
Oops, I just noticed a bug in my patch for bug 996848, which should result in the leaking of EventRef
s, and possibly other problems. Here's the fix:
diff --git a/widget/cocoa/nsAppShell.mm b/widget/cocoa/nsAppShell.mm
--- a/widget/cocoa/nsAppShell.mm
+++ b/widget/cocoa/nsAppShell.mm
@@ -730,8 +730,17 @@ bool nsAppShell::ProcessNextNativeEvent(
} while (mRunningEventLoop);
if (eventProcessed) {
- moreEvents = (AcquireFirstMatchingEventInQueue(currentEventQueue, 0, NULL,
- kEventQueueOptionsNone) != NULL);
+ EventRef nextEvent =
+ AcquireFirstMatchingEventInQueue(currentEventQueue, 0, NULL,
+ kEventQueueOptionsNone);
+ if (nextEvent) {
+ // This call to ReleaseEvent() matches a call to RetainEvent() in
+ // AcquireFirstMatchingEventInQueue().
+ ReleaseEvent(nextEvent);
+ moreEvents = true;
+ } else {
+ moreEvents = false;
+ }
}
mRunningEventLoop = wasRunningEventLoop;
Assignee | ||
Comment 11•3 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #8)
Your problem isn't caused by calling
AcquireFirstMatchingEventInQueue()
directly -- there's another call (below) that your patch doesn't remove.
Setting eventProcessed = false
avoids the second call to AcquireFirstMatchingEventInQueue()
as well.
(In reply to Steven Michaud [:smichaud] (Retired) from comment #9)
Where (in macOS system code) does "CGEvent coalescing" happen?
AcquireFirstMatchingEventInQueue()
calls PullEventsFromWindowServer()
which calls Convert1CGEvent()
. This pulls an event out of sCGEventQueue
into the Carbon event queue.
The coalescing happens in PushToCGEventQueue
it will coalesce a wheel event with the sLastMouseCGEvent
Comment 12•3 years ago
|
||
Setting
eventProcessed = false
avoids the second call toAcquireFirstMatchingEventInQueue()
as well.
Yes, when nsAppShell::ProcessNextNativeEvent()
is called with aMayWait
== false
. But it is called when aMayWait
and eventProcessed
are true
.
AcquireFirstMatchingEventInQueue()
callsPullEventsFromWindowServer()
which callsConvert1CGEvent()
. This pulls an event out ofsCGEventQueue
into the Carbon event queue.
So you're saying that pulling a CGEvent
out of the sCGEventQueue
prevents it from being coalesced with another CGEvent
?
The coalescing happens in
PushToCGEventQueue
it will coalesce a wheel event with thesLastMouseCGEvent
What calls PushToCGEventQueue
?
Assignee | ||
Comment 13•3 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #12)
Setting
eventProcessed = false
avoids the second call toAcquireFirstMatchingEventInQueue()
as well.Yes, when
nsAppShell::ProcessNextNativeEvent()
is called withaMayWait
==false
. But it is called whenaMayWait
andeventProcessed
aretrue
.
True. Perhaps that's uncommon enough that the problem won't be noticeable.
AcquireFirstMatchingEventInQueue()
callsPullEventsFromWindowServer()
which callsConvert1CGEvent()
. This pulls an event out ofsCGEventQueue
into the Carbon event queue.So you're saying that pulling a
CGEvent
out of thesCGEventQueue
prevents it from being coalesced with anotherCGEvent
?
I believe so yes.
The coalescing happens in
PushToCGEventQueue
it will coalesce a wheel event with thesLastMouseCGEvent
What calls
PushToCGEventQueue
?
PullEventsFromWindowServerOnConnection
which is called by PullEventsFromWindowServer
. (This seems to be mostly happening on a separate NSEventThread
thread now)
Comment 14•3 years ago
•
|
||
I'm looking at several Apple frameworks in a disassembler (HIToolbox, CoreGraphics, SkyLight, CoreFoundation), trying to find a way to discover whether or not an event is waiting for delivery without either spinning the event loop or pulling it out of the sCGEventQueue
. So far no luck. But these things tend to be very complicated, and I don't really have the time now.
We can't spin the event loop when aMayWait
is false. But that may not leave us another way to query for waiting events (or to process them) that doesn't cause other problems. So it might be worthwhile testing a patch on the trunk that gets rid of the second half (the false
half) of the if (aMayWait)
conditional. It should bake there (and potentially on the other branches) for a long time. And I think it should be backed out again if there are any problems.
Maybe by then I'll have time to really work on this bug.
Comment 15•3 years ago
|
||
By the way, I've written a HookCase hook library that covers a lot of this ground, and (if I may say so myself) is the best general reference for macOS event processing that I've ever seen:
https://github.com/steven-michaud/HookCase/blob/master/examples-events.md
It doesn't currently cover event coalescing, but I'll try to add that later.
Comment 16•3 years ago
|
||
I'd still like you to try my suggestion from comment #8. The results may be good enough to muddle through without making major changes to nsAppShell::ProcessNextNativeEvent()
.
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
I tried out your suggestion from comment 8 and it did not fix the issue. Since AcquireFirstMatchingEventInQueue()
is still being called it seems like we'd be still pulling an event out of sCGEventQueue
which will prevent the coalescing.
Comment 18•3 years ago
|
||
I hoped that it would be called less often. Oh well. Thanks for trying.
Assignee | ||
Comment 19•3 years ago
|
||
We'll already leave some events that we get back from
AcquireFirstMatchingEventInQueue in the queue. This change makes it so
we always do that without even looking at the queue. It's not completely
clear what impact this will have, so let's just keep it Nightly only
for now.
Updated•3 years ago
|
Comment 20•3 years ago
|
||
Avoid calling AcquireFirstMatchingEventInQueue when aMayWait is true
This should actually be:
Avoid calling AcquireFirstMatchingEventInQueue when aMayWait is false
Updated•3 years ago
|
Comment 21•3 years ago
|
||
Comment 22•3 years ago
|
||
Backed out changeset 8b7ed045dbd1 (Bug 1734705) for causing build bustages on nsAppShell.mm.
Backout link
Push with failures
Failure Log
Comment 23•3 years ago
|
||
Comment 24•3 years ago
|
||
bugherder |
Assignee | ||
Updated•3 years ago
|
Comment 25•3 years ago
•
|
||
What kind of baking/testing/verification does this need before we nominate for Beta uplift? I guess we'd need another patch making it not Nightly-only when the time comes too.
Comment 26•3 years ago
•
|
||
As far as I'm concerned, I think it just needs lots of exposure to many different kinds of users. This argues for extended baking on both trunk and beta (possibly for longer than a single update cycle on each). I have no idea what kinds of problems might occur, and so have no specific ideas how to test for them. Jeff might have some ideas, though.
Edit: On second thought, it'd be worthwhile to test for native event starvation (if we can figure out how to do that). The current patch stops all processing of native events by nsAppShell::ProcessNextNativeEvent()
while aMayWait
is false.
Comment 27•3 years ago
|
||
Are we going to want to ship bug 1690687 in Fx94 if we don't fix this bug there?
Comment 28•3 years ago
•
|
||
For what it's worth, I just performed a very primitive test for native event starvation in today's mozilla-central nightly (build id 20211012093200, which has this patch). I didn't see any problems.
I dragged objects around on Firefox's default start page. I also tested dragging the native color picker (and colors that I'd selected with it), using the testcase for bug 1618418.
Edit: I tested on macOS 10.15.7 build 19H1419 (the current version).
Assignee | ||
Comment 29•3 years ago
|
||
This will let us get some more testing.
Assignee | ||
Comment 30•3 years ago
|
||
We'll already leave some events that we get back from
AcquireFirstMatchingEventInQueue in the queue. This change makes it so
we always do that without even looking at the queue. It's not completely
clear what impact this will have, so let's just keep it Nightly only
for now.
Comment 31•3 years ago
|
||
Assignee | ||
Comment 32•3 years ago
|
||
Comment on attachment 9245781 [details]
Bug 1734705. Avoid calling AcquireFirstMatchingEventInQueue when aMayWait is false.
Beta/Release Uplift Approval Request
- User impact if declined: Fixes the sluggish scrolling caused by bug 1690687.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It was somewhat risky but the code is currently marked #ifdef EARLY_BETA_OR_EARLIER so it shouldn't cause any risk to release yet. I'm still trying to figure out whether it makes sense to back out bug 1690687 of beta yet, but in the mean time this will help us get more testing.
- String changes made/needed:
Comment 33•3 years ago
|
||
bugherder |
Comment 34•3 years ago
|
||
Comment on attachment 9245781 [details]
Bug 1734705. Avoid calling AcquireFirstMatchingEventInQueue when aMayWait is false.
Approved for 94.0b6. I will note, however, that 94.0b6 is our final early beta this cycle. Once 94.0b7 ships on Sunday night, we'll be back to where we started here.
Comment 35•3 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 36•3 years ago
|
||
Assignee | ||
Comment 37•3 years ago
|
||
Comment on attachment 9246238 [details]
Bug 1734705. Let bug 1734705 ride out to release.
Beta/Release Uplift Approval Request
- User impact if declined: Fixes the sluggish scrolling caused by bug 1690687.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): There's definitely some risk to this, however, the code path is very hot and used for everything on macOS. This has been on Nightly for about a week and in beta for a day or two. The alternative is to back out 1690687.
- String changes made/needed:
Comment 38•3 years ago
|
||
Comment 39•3 years ago
|
||
bugherder |
Comment 40•3 years ago
|
||
Comment on attachment 9245781 [details]
Bug 1734705. Avoid calling AcquireFirstMatchingEventInQueue when aMayWait is false.
We decided to backout bug 1690687 from Fx94, so this is no longer necessary.
Updated•3 years ago
|
Comment 41•3 years ago
|
||
backout |
(In reply to Ryan VanderMeulen [:RyanVM] from comment #35)
https://hg.mozilla.org/releases/mozilla-beta/rev/06d37007e7d0
Backed out from 94.0b8:
https://hg.mozilla.org/releases/mozilla-beta/rev/a6e06f1f7110
Updated•3 years ago
|
Updated•3 years ago
|
Description
•