Closed Bug 1004923 Opened 10 years ago Closed 10 years ago

Refactor and optimize Windows implementation of PRMJ_Now

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(12 files)

3.73 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
3.34 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
7.14 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
2.24 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
4.49 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
10.72 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
2.35 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
4.58 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
3.88 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
2.17 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
1.71 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
1.21 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
As the numbers in bug 603872 comment 0 indicate, this new Windows 8 API can make "new Date()" more than 4x faster on Windows. This is important for benchmarks like Peacekeeper.

But first I'm going to post some prmjtime.* cleanup patches, with some simple changes the code is a lot easier to understand.
This patch uses "double" instead of "long double", uses C++-style casts, and kills some minor C-isms.

Note that 'long double' is exactly the same as 'double' on Windows:

"Previous 16-bit versions of Microsoft C/C++ and Microsoft Visual C++ supported the long double, 80-bit precision data type. In Win32 programming, however, the long double data type maps to the double, 64-bit precision data type."

http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx
Attachment #8416360 - Flags: review?(n.nethercote)
Attachment #8416379 - Flags: review?(n.nethercote)
Rewrite comments to use //, move variables to the point where they are used.
Attachment #8416386 - Flags: review?(n.nethercote)
The outer do-while loop has condition "needsCalibration", so if we set needsCalibration to false and set returnedTime, we can instead return returnedTime directly.
Attachment #8416391 - Flags: review?(n.nethercote)
Invert condition to make it easier to read.
Attachment #8416394 - Flags: review?(n.nethercote)
Attachment #8416360 - Flags: review?(n.nethercote) → review+
Comment on attachment 8416379 [details] [diff] [review]
Part 2 - Cleanup NowCalibrate

Review of attachment 8416379 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/prmjtime.cpp
@@ +80,5 @@
> +    static const int64_t TimeToEpochIn100ns = 0x19DB1DED53E8000;
> +    t -= TimeToEpochIn100ns;
> +
> +    // Divide by 10 to convert to microseconds.
> +    return double(t) * 0.1;

Did you multiple because fmul is faster than fdiv?
Attachment #8416379 - Flags: review?(n.nethercote) → review+
Attachment #8416386 - Flags: review?(n.nethercote) → review+
Comment on attachment 8416391 [details] [diff] [review]
Part 4 - Add some early returns

Review of attachment 8416391 [details] [diff] [review]:
-----------------------------------------------------------------

What a horrible function.
Attachment #8416391 - Flags: review?(n.nethercote) → review+
Attachment #8416394 - Flags: review?(n.nethercote) → review+
> Did you multiple because fmul is faster than fdiv?

Oh, I see that code existed elsewhere and you just moved it.
According to the documentation [0], QueryPerformanceFrequency never fails on Windows XP or newer and it also returns a non-zero frequency. I googled a bit and I don't see any reason to believe this is false, so I think it's safe to take advantage of it.

This means we can remove another indentation level in PRMJ_Now.

Also: "The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached."

I'll optimize this more in a later patch.

[0] http://msdn.microsoft.com/en-us/library/windows/desktop/ms644905%28v=vs.85%29.aspx
Attachment #8416406 - Flags: review?(n.nethercote)
There's a bug in the original code:

   /* On some dual processor/core systems, we might get an earlier time
      so we cache the last time that we returned */
   calibration.last = js::Max(calibration.last, int64_t(highresTime));
   returnedTime = calibration.last;

This value for returnedTime is never returned.. We return either highresTime or lowresTime, or try again.

I thought this was a problem with one of my patches but they just make this a lot more obvious. I think I'll just remove this code for now; I don't feel confident enough to change this (I'm also worried about DST etc changing the clock on purpose)
See the last comment; if we don't use this calibration.last value we may as well remove it. Also, on Windows 8 and newer we want to use a different API function anyway.
Attachment #8416430 - Flags: review?(n.nethercote)
This patch adds a PRMJ_NowInit function and calls it from JS_Init, so that we can eliminate the PR_CallOnce call in PRMJ_Now.

This patch also moves the QueryPerformanceFrequency call to PRMJ_NowInit, as the frequency is fixed at system boot, so that NowCalibrate doesn't have to check for this.
Attachment #8416436 - Flags: review?(n.nethercote)
Attachment #8416448 - Flags: review?(n.nethercote)
The code has two critical sections: data_lock and calibration_lock. Now whenever we take calibration_lock, we also take data_lock, so calibration_lock is redundant.
Attachment #8416462 - Flags: review?(n.nethercote)
These locks have different spincount, that was probably the reason there was a separate calibration lock (it has spincount 0 so that other threads will sleep immediately, as calibration can take a while).

But we also use MUTEX_SETSPINCOUNT(&calibration.data_lock, 0) on the data_lock before we calibrate (and restore it afterwards), so that basically has the same effect.
We're calling GetSystemTimeAdjustment to determine the kernel tick frequency, but this relies on undocumented behavior:

if (GetSystemTimeAdjustment(&timeAdjustment, &timeIncrement, &timeAdjustmentDisabled)) {
    if (timeAdjustmentDisabled)
        skewThreshold = timeAdjustment / 10.0;
    else
        skewThreshold = timeIncrement / 10.0;
}

According to MSDN [0], timeAdjustment and timeIncrement only have meaning if timeAdjustmentDisabled is false. I tested this and I got a 15.6 ms value (and nothing else) in this case, so apparently it works, but it's still undocumented.

GetSystemTimeAdjustment is also super slow: after removing the call the time for a Date.now() micro-benchmark improves from 2750 ms to 712 ms (!) on Windows 7. No wonder new Date has performance problems on Windows (bug 603872).

Furthermore, a stack overflow answer [1] confirms this is not the right function to use here:

"Summary: GetSystemTimeAdjustment is not the function to look at. This function only tells how and if time-changes are done. Depending on the setting of the multimedia timer interface timeBeginPeriod, the progress of time may be done more often and in smaller portions. Use NtQueryTimerResolution to receive the actual time increment."

Unfortunately NtQueryTimerResolution is an undocumented API. This patch instead hardcodes the tick frequency at 15.6 ms, this seems to be the max possible value. It means we'll recalibrate the clock if the highres and lowres timers diverge by more than 30 ms, I think that's fine.

[0] http://msdn.microsoft.com/en-us/library/windows/desktop/ms724394%28v=vs.85%29.aspx
[1] http://stackoverflow.com/a/11743614
Attachment #8416527 - Flags: review?(n.nethercote)
Use GetSystemTimePreciseAsFileTime if it's available (Windows 8 and newer).
Attachment #8416543 - Flags: review?(n.nethercote)
Summary: Use GetSystemTimePreciseAsFileTime for PRMJ_Now on Windows 8 → Refactor and optimize Windows implementation of PRMJ_Now
> njn, thanks for the (quick) reviews :)

You got lucky that I had some time free on Friday night. The rest will have to wait until Monday, I'm afraid.
(In reply to Nicholas Nethercote [:njn] from comment #19)
> You got lucky that I had some time free on Friday night. The rest will have
> to wait until Monday, I'm afraid.

Sure, that's fine. I probably won't have time to land them before Monday anyway.
Parts 6-12 are also green on Try: https://tbpl.mozilla.org/?tree=Try&rev=947dc057d061
Comment on attachment 8416406 [details] [diff] [review]
Part 6 - QueryPerformanceFrequency never fails on Windows XP+

Review of attachment 8416406 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/prmjtime.cpp
@@ +211,4 @@
>  #ifdef JS_THREADSAFE
>      PR_CallOnce(&calibrationOnce, NowInit);
>  #endif
> +    while (true) {

Good -- I was wondering if this loop could become a |while (true)| loop.
Attachment #8416406 - Flags: review?(n.nethercote) → review+
Comment on attachment 8416430 [details] [diff] [review]
Part 7 - Kill calibration.last, returnedTime

Review of attachment 8416430 [details] [diff] [review]:
-----------------------------------------------------------------

Can CalibrationData::last be removed now? (Maybe you do that in a later patch.)
Attachment #8416430 - Flags: review?(n.nethercote) → review+
Attachment #8416436 - Flags: review?(n.nethercote) → review+
Attachment #8416448 - Flags: review?(n.nethercote) → review+
Comment on attachment 8416462 [details] [diff] [review]
Part 10 - Remove calibration lock

Review of attachment 8416462 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but I'm a bit hazy on this one -- I don't entirely follow how these spinlocks work.
Attachment #8416462 - Flags: review?(n.nethercote) → review+
Comment on attachment 8416527 [details] [diff] [review]
Part 11 - Remove GetSystemTimeAdjustment call

Review of attachment 8416527 [details] [diff] [review]:
-----------------------------------------------------------------

Windows... sigh.

> This patch instead hardcodes the tick frequency at 15.6 ms, this seems to be the max
> possible value. It means we'll recalibrate the clock if the highres and lowres timers
> diverge by more than 30 ms, I think that's fine.

Can you put these reasoning into a comment? r=me with that.
Attachment #8416527 - Flags: review?(n.nethercote) → review+
Attachment #8416543 - Flags: review?(n.nethercote) → review+
Code cleanups like this are painful but worthwhile. Thanks for doing it.
Parts 6-10 (will land 11 and 12 in a few hours, to help bisection and analysis of possible perf changes):

https://hg.mozilla.org/integration/mozilla-inbound/rev/607823013467
https://hg.mozilla.org/integration/mozilla-inbound/rev/49af8c91e026
https://hg.mozilla.org/integration/mozilla-inbound/rev/d52cc82aade3
https://hg.mozilla.org/integration/mozilla-inbound/rev/844972bf5334
https://hg.mozilla.org/integration/mozilla-inbound/rev/ba31cd1620bd

(In reply to Nicholas Nethercote [:njn] from comment #24)
> Can CalibrationData::last be removed now? (Maybe you do that in a later
> patch.)

It's removed in this patch :)

(In reply to Nicholas Nethercote [:njn] from comment #26)
> Can you put these reasoning into a comment? r=me with that.

Good idea.

(In reply to Nicholas Nethercote [:njn] from comment #27)
> Code cleanups like this are painful but worthwhile. Thanks for doing it.

No problem; thanks again for the fast reviews.
Blocks: 916851
(In reply to Jan de Mooij [:jandem] from comment #16)
> "Summary: GetSystemTimeAdjustment is not the function to look at. This
> function only tells how and if time-changes are done. Depending on the
> setting of the multimedia timer interface timeBeginPeriod, the progress of
> time may be done more often and in smaller portions. Use
> NtQueryTimerResolution to receive the actual time increment."

Note that we do use timeBeginPeriod/timeEndPeriod in the RefreshDriver to get a smaller inteval for high frequency updates; but I don't think that should matter here.  Date.now() can be inaccurate, window.performance.now() is the preferred more accurate timing method.
Blocks: 824495
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: