Open Bug 1810192 Opened 2 years ago Updated 8 months ago

[MacOS][Speedometer2] SetTimeout overruns by 100ms while main-thread is idle

Categories

(Core :: Performance, defect, P3)

Unspecified
macOS
defect

Tracking

()

Performance Impact none

People

(Reporter: tcampbell, Unassigned)

Details

(Whiteboard: I)

The MacOS speedometer2 scores are bimodal with the fast path being 20-25% better than the normal case. https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&series=autoland,3924422,1,13&timerange=7776000

One thing that stands out in the profiles is a lot of idle time on main-thread, and in particular there are setTimeouts of 50ms that overrun by at least 100ms in the slow/normal case while the thread is idle.

Normal case: https://share.firefox.dev/3ZwR7RA
Fast outlier: https://share.firefox.dev/3W5JwX7

Is there a strange interaction with how our timeouts interact with the OS going on?

See Also: → 1795627

(In reply to Bobby Holley (:bholley) from comment #1)

Possibly related: https://webkit.org/blog/13083/speedometer-2-1/

We are tracking that issue as well for the speedometer work, but that involves turning setTimeout(0) into a 4ms pause. What I'm seeing here is a 50ms timeout taking 190ms.

See Also: 1795627

The Linux behaviour is nearly exactly on time, and seems to have different stacks for the sleep (eg, no TaskController frames)

Linux: https://share.firefox.dev/3vZkj6j

EDIT: This is wrong, Windows shows the same stacks with TaskController, so this might just be due to missing inline-frame information in the profiles on CI.

I attempted to reproduce this on a Macbook Pro (2020, M1, OS 11.6) and did not see it the issue (timers were 50-53 ms to execute).

The CI machines are x86_64 machine running on OS 10.16 still, so perhaps there are differences there. I'll attempt again to get profiles from CI with the Timer thread (as suggested by Markus).

I had wondered if this was to do with the "App Nap" feature, but Markus explained that AppNap is a different behaviour and is unrelated.

A more likely culprit suggested by Markus is that MacOS penalizes programs with too many wake ups. See: https://developer.apple.com/forums/thread/124180

If we want to improve MacOS 10.x performance still, it might require some changes around how timers and tasks are organized and what system APIs we use.

It is probably worth someone investigating if this over-budget timer behaviour happens to anyone on MacOS 11 machines. Our CI machines are still on 10.15, so perhaps someone with an up-to-date laptop could test a few times.

Just for triaging purposes marking this P3/S3, but this is blocking speedometer3

Severity: -- → S3
Priority: -- → P3

I found an Intel Mac Mini equivalent to the CI ones (i7-8700B, OS 10.15.7 Catalina) and local testing consistently gives me the fast outlier (about 165 points today). It seems that our CI harness setup is penalizing us by 25% for unknown reasons.

When I run with Firefox Profiler locally, I do see the kernel warnings about too many wakeups, but I do not see them when I benchmark without the profiler.

Performance Impact: --- → none
Whiteboard: [sp3]

I can't quite see how this is anymore an sp3 bug, since it uses different kind of runner.

No longer blocks: speedometer3
Whiteboard: [sp3] → I
You need to log in before you can comment on or make changes to this bug.