Closed Bug 1100920 Opened 5 years ago Closed 5 years ago

Add telemetry probes for frame handling latency

Categories

(Core :: Graphics: Layers, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: avih, Assigned: avih)

Details

Attachments

(1 file)

Right now we have few probes for events latency and hangs, but they have relatively high thresholds which intend to capture meaningful jank.

This bug is about being able to probe per frame how far we are from handling it on time.

While it does access telemetry more frequently than other probes we have, my few local measurements didn't detect a performance regression even in ASAP mode where the actual frames handling throughput limit is reflected.
Attached patch bug1100920.patchSplinter Review
Note that the probe only measures the "high precision timer" (in vsync or fixed timer mode), i.e. when we have a frame callback from the refresh driver which should be handled on time.

Frame latencies of more than ~11-14 ms probably indicate a missed frame (for 60Hz display rate).

This probe will let us measure how many of our frames are late, and from that deduce what percentage of frames are not handled on time.
Attachment #8524581 - Flags: review?(roc)
https://hg.mozilla.org/mozilla-central/rev/163a9cee5436
Assignee: nobody → avihpit
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Robert, telemetry starts showing - http://mzl.la/1zRFtOP

It would appear that the 95th percentile is ~14ms late. And since the scheduling is done before the current frame is handled, it would typically leave about 2.6ms to complete handling of the current frame, which I think is just about not enough usually.

In other words, 1/20 of our frames are late (but even if it's 1/60, it's still 1 frame late for every second of animation, on average).

Do you find the conclusion roughly valid?

Do you think these numbers are to be trusted? They seem very high to me...
Flags: needinfo?(roc)
It sounds plausible. Of course, "1/20 of our frames are late" could mean that some group of users gets a lot of late frames and other users are fine. I think this data would mainly be useful for assessing whether work like Silk is having an impact.
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> It sounds plausible. Of course, "1/20 of our frames are late" could mean
> that some group of users gets a lot of late frames and other users are fine.

Sure. This was only to get a feel for the numbers this probe produces.

We can already see that on Yosemite the 95th percentile is ~8ms, and we hypothesized that it might have to do with SSD which is present on most Yosemite systems but not on most overall systems (which are dominated by windows).

> I think this data would mainly be useful for assessing whether work like
> Silk is having an impact.

Except that this probe is likely to break when moving to Silk because the refresh driver might not know any longer if it's late or not since the scheduling becomes external to it.

I think we should keep a similar probe with Silk too. Its output seems very potent to me.
You need to log in before you can comment on or make changes to this bug.