Closed Bug 1675614 Opened 4 years ago Closed 3 years ago

Investigate reducing main thread paint latency by using instant refresh ticks in some scenarios

Categories

(Core :: Layout, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox89 --- fixed

People

(Reporter: mstange, Assigned: mattwoodrow)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(3 files)

Our regular main thread paints are currently triggered from vsync notifications: the paint happens within a refresh tick, and the refresh driver is controlled by vsync notifications. The original idea behind this is to make sure we don't paint more frequently than the vsync rate.

This means that, whenever a "paint originating event" for a paint occurs, we wait for the next vsync notification before we paint. This introduces up to 16.7ms (on a 60Hz display) of latency, depending on where exactly within the vsync interval the event happened.

This affects keyboard latency, as can be seen in the following profile: https://share.firefox.dev/3mU9o7c
The profile shows keyboard latency numbers between 23ms and 38ms. This "latency" number is the delta between the key event's native timestamp and the end of the composite. Here's the latency breakdown for one of the events, which has a latency of 29.4ms:

  • 3.1ms of latency between the system timestamp and the key event handler in the content process
  • 7.8ms wait time for the refresh tick <-- This is what this bug aims to reduce
  • 0.5ms refresh tick, which includes the paint
  • 0.4ms until the paint has arrived on the compositor thread
  • 15.7ms wait time for the next vsync
  • 1.9ms of compositing

I believe we can address the main thread latency for this use case while still making sure we only have one paint per vsync interval.


The basic implementation idea is the following:

  • If the refresh driver hasn't ticked in a while, allow an extra "early tick".
  • Otherwise, keep the current behavior.

Or, more specifically: Maintain a piece of state that says whether we've ticked in the current "vsync interval". If EnsureTimerStarted() is called at a point where we haven't yet had a tick in the current vsync interval, dispatch a high-priority runnable for the tick, and don't wait for vsync. But the next tick will need to wait for vsync, unless enough time elapses that we're in the next vsync interval. Here's some pseudocode.
This "early tick" can also be considered a "late tick" that's borrowed from the previous vsync: We didn't tick at the last vsync time, so we take the hypothetical refresh tick that could have happened at that point, and shift it over to the current time.

I think this will help reduce keyboard latency, at least in the case where no other animations are going on, when nothing else is ticking the refresh driver: The paint will happen earlier, so it will arrive at the compositor earlier, so we may catch a composite one frame earlier to present the result. For the example event from earlier, I'm hoping for this latency breakdown:

  • 3.1ms of latency between the system timestamp and the key event handler in the content process
  • no wait time for the refresh tick
  • 0.5ms refresh tick, which includes the paint
  • 0.4ms until the paint has arrived on the compositor thread
  • 6.9ms wait time for the next vsync (catching the previous vsync now!)
  • 1.9ms of compositing

for a total of 12.8ms delay, rather than the original 29.4ms. That's a reduction by 16.6ms, one full vsync interval.

Potential concerns:

  • More work due to more refresh ticks: This idea can lead to extra refresh ticks if there are multiple tick triggers within one vsync interval. The first trigger will get a refresh tick for itself, and the remaining triggers will be processed by the refresh tick at the next vsync. This is one more tick than in the past, and it causes extra work that otherwise could have been coalesced into one refresh tick: it could be an extra layout flush, or firing an extra non-coalesced mouse move event, etc.
  • Variable frame latency: Another idea behind the "paint on vsync" strategy was to ensure uniform frame latency for paints that complete within a budget of one vsync interval. Handling some paints early will impact this uniformity because the "early paints" will potentially have one less frame of latency.
  • Delayed processing of other events: If multiple paint triggers happen within one vsync interval, the early tick for the first trigger might take so long that it delays the tick that processes the rest of the triggers. For example, if three images finish decoding in quick succession, this proposal will cause the early paint to paint just the first image, and the two other images will be painted in the next paint. If the early paint is quick, then the next tick happens at vsync time and the paint for the two other images happens at the same time as it would have happened in the past. But if the early paint is slow, then the next tick is delayed, and the other two images paint later than before.

I believe the idea is worth trying despite the concerns. I'm not worried about frame uniformity, because uniformity only really matters during animations. And this proposal does not affect what happens during animations, because there's have a consistent rate of refresh ticks and no refresh tick pauses during those. The extra paint only happens at the beginning of such a stream of ticks.
Similarly, it also doesn't affect coalesced work during continuous streams of events; for example, when the mouse is moving, we will do just as much coalescing of mouse move events as before, except for the first event of the series. The only extra work happens at the beginning of the streams.
As for the "delayed follow-up paint" issue, that is the one issue that is slightly worrying me. However, it's something that can already happen depending on timing, if vsync happens to fire between the events.

Summary: Investigate reducing main thread paint latency using instant refresh ticks, in some scenarios → Investigate reducing main thread paint latency by using instant refresh ticks in some scenarios

Do you have data on how often keyboard inputs would be able to take advantage of this speedup? i.e. when a person types, will all of their keypresses get sped up (because the inter-keypress delay is high and nothing else is ticking the refresh driver)?

Good question, I don't think we have that data. But we have telemetry on keyboard latency so we should hopefully see those numbers go down once this is finished. My intuition is that it'll help on simple pages like bugzilla but not on pages with lots of ads. I've also seen profiles of the google search form where they were running requestAnimationFrame the entire time and not doing anything with it.

Assignee: nobody → matt.woodrow

I can't see a reason to overwrite the most recent refresh time of the driver with a timestamp at which we didn't tick (no tests seem to depend on it), and we need to not do this in order to be able to tell later on if we've ticked this driver for the current timer's tick.

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/24d789ec28a6
Don't set the most recent refresh time of the driver when changing timers. r=mstange
https://hg.mozilla.org/integration/autoland/rev/bd899813fd0c
Try doing a 'catch-up' refresh driver tick if we get a paint request part way into a tick, and haven't yet painted that tick. r=mstange
https://hg.mozilla.org/integration/autoland/rev/cc33a0848d85
Process spellcheck in chunks in the parent process, since it's also async. r=m_kato
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
Regressions: 1701168

(In reply to Pulsebot from comment #6)

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cc33a0848d85
Process spellcheck in chunks in the parent process, since it's also async.
r=m_kato
== Change summary for alert #29476 (as of Sun, 28 Mar 2021 21:13:01 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% tabpaint linux1804-64-shippable-qr e10s stylo webrender 47.15 -> 42.56

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29476

Regressions: 1702466

Bug 1708325 extended the technique from this bug to allow one "extra" tick per frame, if a tick is requested during user input events.

Blocks: 1708325
Regressions: 1718220

I found this through the perf blog, and I think maybe this technique, or something similar, could also remove the frame of lag introduced by Bug 1375949?

Is it expected that this also affects requestAnimationFrame?
A lot of user-land scripts do use this method as a delay until next paint, e.g as a mean to throttle high frequency async events. Since this change, these will not work anymore, because now requestAnimationFrame from a non-animated document is just a setTimeout(0).

Also, now the second call to rAF in a loop will be random (on my 60Hz monitor between 10ms and 26ms). https://jsfiddle.net/bpxhug2q/

Note that Chrome also has a bug here, but in worse, their second call to rAF is not even waiting for the next painting frame, meaning they've got two callbacks for the same frame...

Yes, that's expected. Yes, the delay will be 0 on a non-animated document, on the first call. But repeated calls will still be throttled to the refresh rate. Or are you saying that you get a delay of zero even on repeated calls, if the document is not animated?

No this is indeed the behavior I see, I am saying that this is an unexpected and surprising behavior though, and that many user-land scripts do expect that a call to requestAnimationFrame indeed delays their callback until the next painting frame, even from a non-animated document.

I guess I still don't understand how that missing could have an undesirable outcome.

(In reply to Kaiido from comment #13)

many user-land scripts do expect that a call to requestAnimationFrame indeed delays their callback until the next painting frame

And that's still the case - the next paint also happens without a delay.

Regressions: 1783681
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: