Open Bug 1863455 Opened 1 year ago Updated 1 year ago

Consider using QueryPerformancyCounter directly for PRMJ_Now instead via GetSystemTimePreciseAsFileTime

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

People

(Reporter: jrmuizel, Unassigned, NeedInfo)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [sp3])

It seems like GetSystemTimePreciseAsFileTime is adding a lot of overhead and I'm not sure it's helpful. V8 implements Date.now() on top QueryPerformanceCounter directly.

FWIW, this shows up in the backbone.js SP3 test under the debounce/debounced< function.

Blocks: speedometer3

We used to have a pretty complicated implementation of this that I replaced years ago with GetSystemTimePreciseAsFileTime for Windows 8+. A few months ago I removed the old implementation in this commit when we dropped support for Windows < 10.

I vaguely remember the old code either being slower or having similar performance back then, but that was a long time ago on Windows 8. It's probably worth checking the V8 code to see if they have a nicer version.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [sp3]

Based on an initial skim of the V8 code, they have a fast version that uses QueryPerformanceCounter directly and also a slower implementation they use in some cases. Maybe we could use GetSystemTimePreciseAsFileTime as our slow path and carve out a fast path using QueryPerformanceCounter.

NI myself to look into this more.

Flags: needinfo?(jdemooij)

V8 uses QueryPerformanceCounter after checking CPUID for non-stop TSC and checking that QueryPerformanceFrequency returns a frequency > 0 (I think that's guaranteed to be true since XP).

They sync this timestamp with the system time every 60 seconds (kMaxTimeToAvoidDrift) in subtle::TimeNowIgnoringOverride. This uses GetSystemTimeAsFileTime as "base" wall clock.

I noticed we already have similar QPC timestamp code in mozglue/misc/TimeStamp_windows.cpp. For performance.now(), we use TimeStamp::Now() (in Performance::NowUnclamped).

One option is to use our TimeStamp code with periodic sync iff sHasStableTSC. If sHasStableTSC is false, we'd fall back to the current GetSystemTimePreciseAsFileTime.

Hey Jeff, which Windows version was this on? It's not a VM right?

I have a WIP patch to use TimeStamp::Now() and sync it if > 20 seconds have elapsed. On my laptop (Intel i7 8th gen, Windows 11) it's ~20% slower in the JS shell for a Date.now() micro-benchmark. It does use the HasStableTSC branch.

The numbers I get:

  • Chrome Canary: 495 ms
  • FF Try PGO build JS shell, without patch: 581 ms
  • FF Try PGO build JS shell, with patch: 702 ms
  • FF Nightly: 870 ms

I think a lot of the browser overhead is from timer precision/fingerprinting mitigations.

Flags: needinfo?(jdemooij) → needinfo?(jmuizelaar)

(In reply to Jan de Mooij [:jandem] from comment #5)

Hey Jeff, which Windows version was this on? It's not a VM right?

This was win10 on AMD Ryzen hardware.

Can you get before and after profiles using https://github.com/jrmuizel/etw-profiler/?

Flags: needinfo?(jmuizelaar) → needinfo?(jdemooij)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

Can you get before and after profiles using https://github.com/jrmuizel/etw-profiler/?

Before: https://share.firefox.dev/42fg19V
After: https://share.firefox.dev/42el4Yn

In the Before profile, we spend relatively more time under GetSystemTimePreciseAsFileTime in RtlQueryPerformanceCounter than in your profile. Maybe they optimized this or maybe it's because I'm calling it in a tight loop?

In the After profile, the critical section in mozilla::PerformanceCounter is pretty expensive. I'm not sure if that code to handle jitter is still necessary, but even if I comment that out, we're still slower. I haven't investigated the remaining gap yet.

Bug 1515155 is about how to fixing the critical section locking.(In reply to Jan de Mooij [:jandem] from comment #8)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

Can you get before and after profiles using https://github.com/jrmuizel/etw-profiler/?

Before: https://share.firefox.dev/42fg19V
After: https://share.firefox.dev/42el4Yn

In the Before profile, we spend relatively more time under GetSystemTimePreciseAsFileTime in RtlQueryPerformanceCounter than in your profile. Maybe they optimized this or maybe it's because I'm calling it in a tight loop?

That could be system/CPU differences or perhaps a caching thing because my profile doesn't have the tight loop.

In the After profile, the critical section in mozilla::PerformanceCounter is pretty expensive. I'm not sure if that code to handle jitter is still necessary, but even if I comment that out, we're still slower. I haven't investigated the remaining gap yet.

Bug 1515155 is about how to remove that locking in the common case.

Depends on: 1515155
You need to log in before you can comment on or make changes to this bug.