Open Bug 1814951 Opened 1 year ago Updated 3 months ago

[speedometer] setTimeout throttling on Windows has latency of 15ms vs 4ms for Chrome

Categories

(Core :: Performance, defect)

defect

Tracking

()

Performance Impact none

People

(Reporter: tcampbell, Assigned: jlink)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [sp3])

When chaining setTimeout(0) calls, the HTML spec requires that we throttle after a certain point to a minimum of 4ms. When testing this on Firefox builds on Windows, we typically have a latency closer to 16ms.

Simple example: https://jsfiddle.net/sqf2r0wo/

The 16ms is default Windows timer resolution / kernel tick (waves hands someone can probably give a more precise answer here).

Turning on profiling causes the latency issue to go away, which is expected with how the profiler work.

Opening this bug to gather evidence as we evaluate if we need to revisit timer design.

(Also, credit to Doug for re-raising this issue :)

On a Linux machine, I experimented with modifying Chrome builds to adjust the throttle from 4ms to 16ms and looked at impact on Speedometer 2.1. With that experiment, I see speedometer score drop from 145 -> 140 so there continues to be sensitivity to setTimeout scheduling in the benchmark.

(Bug 1808582 covers some notes on building local optimized chrome builds)

https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ for some interesting talk about this (it's not directly on topic, but it mentions a few times off-hand what Chrome does. (Which is, they set the global Windows timer interrupt period to 1ms, or 8ms if on battery.))

EDIT: This is a weird temporary effect and after a minute or two things get back to normal.

Using ClockRes I see that when I using GeckoProfiler it changes timer to 1ms, but when I finish profiling it sticks at a number from 4-8 ms and doesn't return to 16ms. Restarting the browser gets back to the normal value.

When I get in this mode, I notice a measurable impact to speedometer score (3-5%). Our browsertime runs seem to include some initial profile in them and I'm wondering if this is causing us to have strange results in our scores, and also if this is why we have had a hard time reproducing this (and the throttle depth changes) in CI.

Depends on: 1814966

On try I don't see nearly as exaggerated effect as I do locally: https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=8b4e763d1d7720a900b9840e6336831fc43bb15f&originalSignature=3451139&newSignature=3451139&framework=13&originalRevision=487d5fad0b7e8785ce652405a06084349d875658&page=1. I do think something is fishy here.

Note the following comment in the documentation of timeBeginPeriod:

Prior to Windows 10, version 2004, this function affects a global Windows setting. For all processes Windows uses the lowest value (that is, highest resolution) requested by any process. Starting with Windows 10, version 2004, this function no longer affects global timer resolution. For processes which call this function, Windows uses the lowest value (that is, highest resolution) requested by any process. For processes which have not called this function, Windows does not guarantee a higher resolution than the default system resolution.

One guess at what is going on here is that on systems which don't see an improvement from this change, another process is giving us this win for free. I think that should generally not be the case though - so I think it's very important that we settle on a strategy for this.

The worker-pool for performance tests are labelled as win10-64-1803 which could either be an out of date name, or they could still be running very old Win10 kernels. I will try to get better info on what is actually running in CI next week.

EDIT: Confirmed that the CI hardware is indeed running Win10 18.03 builds so it does have the old timer behaviour.

It would be very interesting to know how setting timeBeingPeriod compares to using the new high res timer API: CREATE_WAITABLE_TIMER_HIGH_RESOLUTION. Perhaps it lets us have nice things without causing as much of a power regression.

Justin might have thoughts on this as our newly-found resident timing expert :-).

Performance Impact: --- → none
Flags: needinfo?(jlink)

I probably don't have much to share yet but this is definitely on my radar and, in fact, something that I'm thinking a bit about right now. This ties in closely, and is essentially a follow up, to https://bugzilla.mozilla.org/show_bug.cgi?id=1783405, which is really about making TimerThread efficient and effective. What can be changed as far as timer coalescing on Windows is a bit limited right (the low frequency of the thread scheduling is essentially forcing some timer coalescing but not really allowing us to control it). Increasing the frequency could potentially us to have both responsive timers and efficient use of wake-ups (at least to have a better dial for choosing between the two).

Flags: needinfo?(jlink)
Assignee: nobody → jlink
Whiteboard: [sp3]

In experimenting with increasing the thread scheduling frequency via timeBeginPeriod() I can confirm that, with a 1ms period (what Chrome uses), the delay measured by Ted's test does go down to the expected values and is similar to what I see in Chrome as well.

Blocks: 1824472
Depends on: 1826224

bug 1826224 didn't affect sp3 numbers, as far as I see (and I didn't expect so), right? So perhaps this shouldn't have [sp3] nor block speedometer3 meta bug?
(This was filed for sp2.1, which has very different runner.)

Flags: needinfo?(jlink)

smaug: Initially we were seeing a pretty clear improvement locally in SP3 from increasing the timer resolution (and none in CI, for reasons that are understood) so I think that's why it was tagged with sp3.

When checked again more recently, increasing the timer resolution had a much smaller impact locally (probably between 0.5 and 1.0 percent) and still none in CI.

So there probably is some impact to SP3, but it's not huge and it won't be seen in CI.

Flags: needinfo?(jlink)
See Also: → 1872702
Depends on: 1881627

I still can't understand how this could really affect sp3 :), so cleared dependency. Feel free to add it back.

You need to log in before you can comment on or make changes to this bug.