Closed
Bug 784859
Opened 12 years ago
Closed 12 years ago
Make the Windows TimeStamp implementation faster
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla18
People
(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)
Details
Attachments
(6 files, 9 obsolete files)
2.32 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
4.81 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
7.46 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
2.46 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
5.15 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
2.77 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
I and Jeff spent some time today to improve our Windows TimeStamp implementation. Here is the major changes that we have implemented:
* Use QueryPerformanceCounter directly if the machine has a stable TSC
* Use GetTickCount64 if the platform supports it, and have a fallback for Windows XP which simulates it by calling GetTickCount.
* Avoid locking in almost all calls to TimeStamp::Now for machines without a stable TSC by doing atomic operations.
I'll attach the patches here, and I'll ask Honza for review since he wrote the original code.
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Comment 2•12 years ago
|
||
Attachment #654421 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 3•12 years ago
|
||
Attachment #654422 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 4•12 years ago
|
||
Attachment #654423 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 5•12 years ago
|
||
Attachment #654424 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #654425 -
Flags: review?(honzab.moz)
Comment 7•12 years ago
|
||
Comment on attachment 654421 [details] [diff] [review]
Part 2: Avoid locking to store the computed result in the global variable in CalibratedPerformanceCounter
Review of attachment 654421 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +517,5 @@
> + do {
> + readValue = InterlockedAdd64(destination, 0);
> + if (readValue > newValue)
> + return readValue;
> + } while (readValue != InterlockedCompareExchange64(destination, newValue, readValue));
This will need something like:
+inline LONGLONG _cdecl
+MozInterlockedCompareExchange64(LONGLONG volatile *Destination,
+ LONGLONG Exchange,
+ LONGLONG Comparand)
+{
+ __asm {
+ lea esi,[Comparand]
+ lea edi,[Exchange]
+ mov eax,[esi]
+ mov edx,[esi+4]
+ mov ebx,[edi]
+ mov ecx,[edi+4]
+ mov esi,[Destination]
+ lock cmpxchg8b [esi]
+ }
+}
+
+inline LONGLONG _cdecl
+MozInterlockedGet64(LONGLONG volatile *Source)
+{
+ __asm {
+ xor eax,eax
+ xor edx,edx
+ xor ebx,ebx
+ xor ecx,ecx
+ mov esi,[Source]
+ lock cmpxchg8b [esi]
+ }
+}
+
to work on XP which doesn't have InterlockedCompareExchange64.
InterlockedAdd64 is only supported on Itanium
Comment 8•12 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Comment on attachment 654421 [details] [diff] [review]
> Part 2: Avoid locking to store the computed result in the global variable in
> CalibratedPerformanceCounter
>
> Review of attachment 654421 [details] [diff] [review]:
> -----------------------------------------------------------------
> +
>
> to work on XP which doesn't have InterlockedCompareExchange64.
>
> InterlockedAdd64 is only supported on Itanium
We might also be able to use __InterlockedCompareExchange64
Assignee | ||
Comment 9•12 years ago
|
||
This version of the patches actually compiles! And it's improved much.
Attachment #654420 -
Attachment is obsolete: true
Attachment #654421 -
Attachment is obsolete: true
Attachment #654422 -
Attachment is obsolete: true
Attachment #654423 -
Attachment is obsolete: true
Attachment #654424 -
Attachment is obsolete: true
Attachment #654425 -
Attachment is obsolete: true
Attachment #654420 -
Flags: review?(honzab.moz)
Attachment #654421 -
Flags: review?(honzab.moz)
Attachment #654422 -
Flags: review?(honzab.moz)
Attachment #654423 -
Flags: review?(honzab.moz)
Attachment #654424 -
Flags: review?(honzab.moz)
Attachment #654425 -
Flags: review?(honzab.moz)
Attachment #654857 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 10•12 years ago
|
||
Attachment #654858 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 11•12 years ago
|
||
Attachment #654859 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 12•12 years ago
|
||
Attachment #654860 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 13•12 years ago
|
||
Attachment #654861 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 14•12 years ago
|
||
Attachment #654862 -
Flags: review?(honzab.moz)
Comment 15•12 years ago
|
||
Would this also fix bug 603872?
Comment 16•12 years ago
|
||
(In reply to Trev from comment #15)
> Would this also fix bug 603872?
No. Date() does not use TimeStamp::Now()
Assignee | ||
Comment 17•12 years ago
|
||
Comment on attachment 654857 [details] [diff] [review]
Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
Brian, could you please take a stab at reviewing these patches? The code in question is fairly localized, and relatively easy to understand. :-)
Attachment #654857 -
Flags: review?(honzab.moz) → review?(netzen)
Assignee | ||
Updated•12 years ago
|
Attachment #654858 -
Flags: review?(honzab.moz) → review?(netzen)
Assignee | ||
Updated•12 years ago
|
Attachment #654859 -
Flags: review?(honzab.moz) → review?(netzen)
Assignee | ||
Updated•12 years ago
|
Attachment #654860 -
Flags: review?(honzab.moz) → review?(netzen)
Assignee | ||
Updated•12 years ago
|
Attachment #654861 -
Flags: review?(honzab.moz) → review?(netzen)
Assignee | ||
Updated•12 years ago
|
Attachment #654862 -
Flags: review?(honzab.moz) → review?(netzen)
Comment 18•12 years ago
|
||
Comment on attachment 654857 [details] [diff] [review]
Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
Review of attachment 654857 [details] [diff] [review]:
-----------------------------------------------------------------
As far as I understand QueryPerformanceCounter is a safe API that uses the time stamp counter register only when it's safe.
Otherwise it provides a safe and fast alternative. I don't understand why it would be gated on the same conditions as the TSC.
Using the TSC register directly would be faster than using QueryPerformanceCounter if it was safe.
See here: http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx
I suspect I'm missing some known information though.
Comment 19•12 years ago
|
||
(In reply to Brian R. Bondy [:bbondy] from comment #18)
> Comment on attachment 654857 [details] [diff] [review]
> Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
>
> Review of attachment 654857 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> As far as I understand QueryPerformanceCounter is a safe API that uses the
> time stamp counter register only when it's safe.
> Otherwise it provides a safe and fast alternative. I don't understand why
> it would be gated on the same conditions as the TSC.
> Using the TSC register directly would be faster than using
> QueryPerformanceCounter if it was safe.
>
> See here: http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx
>
> I suspect I'm missing some known information though.
The problem is that QueryPerformanceCounter isn't always safe. So we need to know when it's actually safe to use.
Comment 20•12 years ago
|
||
> The problem is that QueryPerformanceCounter isn't always safe.
> So we need to know when it's actually safe to use.
In comment 18 I'm really asking for a resource that can explain exactly why it's not safe. I suspect there is some documentation we're going off of, I just don't know where to find it.
The first patch seems to assume that QueryPerformanceCounter and TSC is the same thing, when it's not.
And if we know TSC is safe, why use QueryPerformanceCounter at all?
The only remarks about QueryPerformanceCounter on its MSDN page about a problem with it is:
> On a multiprocessor computer, it should not matter which processor is called.
> However, you can get different results on different processors due to bugs in the
> basic input/output system (BIOS) or the hardware abstraction layer (HAL)
Which may not be an actual be a show stopper for using it.
The other page I mentioned above in Comment 18 states problems with TSC and recommends to use QueryPerformanceCounter instead.
Comment 21•12 years ago
|
||
The reason I'm asking for the reason QueryPerformanceCounter (QPC) is broken by the way, is because I'm being asked to review a patch that may not be using the correct heuristic for checking if QPC works or not.
Comment 22•12 years ago
|
||
Comment on attachment 654857 [details] [diff] [review]
Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
Review of attachment 654857 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +38,5 @@
> +
> + __cpuid(regs, 0x80000007);
> + // if bit 8 is set than TSC will run at a constant rate
> + // in all ACPI P-state, C-states and T-states
> + return regs[4] & (1 << 8);
regs[4] is out of bounds!
Attachment #654857 -
Flags: review?(netzen) → review-
Comment 23•12 years ago
|
||
Comment on attachment 654858 [details] [diff] [review]
Part 2: Avoid locking to store the computed result in the global variable in CalibratedPerformanceCounter
Review of attachment 654858 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +39,5 @@
> +
> + __cpuid(cpuInfo.regs, 0);
> + // Only allow Intel CPUs for now
> + if (memcmp(cpuInfo.cpuString, "GenuineIntel", sizeof(cpuInfo.cpuString)))
> + return false;
This needs to be a case insensitive compare, my CPU returns genuineintel for these 8 bytes.
Attachment #654858 -
Flags: review?(netzen) → review-
Comment 24•12 years ago
|
||
*12 bytes
Comment 25•12 years ago
|
||
Comment on attachment 654859 [details] [diff] [review]
Part 3: Refactor TickCount64 to make its signature similar to GetTickCount64
Review of attachment 654859 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +397,5 @@
> // @param gtc
> // Result of GetTickCount(). Passing it as an arg lets us call it out
> // of the common mutex.
> static inline ULONGLONG
> +TickCount64()
While we're here, maybe we should just use GetTickCount64 when it's available? (Vista and later)
Comment 26•12 years ago
|
||
Comment on attachment 654860 [details] [diff] [review]
Part 4: Use the native GetTickCount64 function where available
Review of attachment 654860 [details] [diff] [review]:
-----------------------------------------------------------------
Awesome! Ignore last comment.
Attachment #654860 -
Flags: review?(netzen) → review+
Updated•12 years ago
|
Attachment #654859 -
Flags: review?(netzen) → review+
Comment 27•12 years ago
|
||
Comment on attachment 654861 [details] [diff] [review]
Part 5: Change the implementation of GetTickCount64Fallback so that it never locks
Review of attachment 654861 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +413,5 @@
> + newValue = (oldTop + (1ULL<<32)) | newBottom;
> + } else {
> + newValue = oldTop | newBottom;
> + }
> + } while (old != _InterlockedCompareExchange64(reinterpret_cast<volatile __int64*> (sLastGTCResult),
Use &sLastGTCResult here not sLastGTCResult, otherwise it looks good. r=me with that.
Attachment #654861 -
Flags: review?(netzen)
Updated•12 years ago
|
Attachment #654862 -
Flags: review?(netzen) → review+
Assignee | ||
Comment 28•12 years ago
|
||
(In reply to Brian R. Bondy [:bbondy] from comment #22)
> Comment on attachment 654857 [details] [diff] [review]
> Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
>
> Review of attachment 654857 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: xpcom/ds/TimeStamp_windows.cpp
> @@ +38,5 @@
> > +
> > + __cpuid(regs, 0x80000007);
> > + // if bit 8 is set than TSC will run at a constant rate
> > + // in all ACPI P-state, C-states and T-states
> > + return regs[4] & (1 << 8);
>
> regs[4] is out of bounds!
Oops :D
Attachment #654857 -
Attachment is obsolete: true
Attachment #659358 -
Flags: review?(netzen)
Assignee | ||
Comment 29•12 years ago
|
||
(In reply to Brian R. Bondy [:bbondy] from comment #23)
> Comment on attachment 654858 [details] [diff] [review]
> Part 2: Avoid locking to store the computed result in the global variable in
> CalibratedPerformanceCounter
>
> Review of attachment 654858 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: xpcom/ds/TimeStamp_windows.cpp
> @@ +39,5 @@
> > +
> > + __cpuid(cpuInfo.regs, 0);
> > + // Only allow Intel CPUs for now
> > + if (memcmp(cpuInfo.cpuString, "GenuineIntel", sizeof(cpuInfo.cpuString)))
> > + return false;
>
> This needs to be a case insensitive compare, my CPU returns genuineintel for
> these 8 bytes.
Fixed. Also, I had the ordering of registers wrong, and I fixed that as well.
Attachment #654858 -
Attachment is obsolete: true
Attachment #659359 -
Flags: review?(netzen)
Assignee | ||
Comment 30•12 years ago
|
||
Attachment #654861 -
Attachment is obsolete: true
Assignee | ||
Comment 31•12 years ago
|
||
Updated•12 years ago
|
Attachment #659358 -
Flags: review?(netzen) → review+
Comment 32•12 years ago
|
||
Comment on attachment 659359 [details] [diff] [review]
Part 2: Avoid locking to store the computed result in the global variable in CalibratedPerformanceCounter
Review of attachment 659359 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +41,5 @@
> + // Only allow Intel CPUs for now
> + // The order of the registers is reg[1], reg[3], reg[2]. We just adjust the
> + // string so that we can compare in one go.
> + if (_strnicmp(cpuInfo.cpuString, "GenuntelineI", sizeof(cpuInfo.cpuString)))
> + return false;
Lookeat!s Gr
:)
Attachment #659359 -
Flags: review?(netzen) → review+
Updated•12 years ago
|
Attachment #659360 -
Flags: review+
Assignee | ||
Comment 33•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d291e2674dc6
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a8acb221714
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b7184f6e154
https://hg.mozilla.org/integration/mozilla-inbound/rev/64ea34bc583e
https://hg.mozilla.org/integration/mozilla-inbound/rev/9544f94ccdb8
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4a8c7efef04
Target Milestone: --- → mozilla18
Comment 34•12 years ago
|
||
It would've been nice to get actual performance numbers for this before checking it in.
Comment 35•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d291e2674dc6
https://hg.mozilla.org/mozilla-central/rev/2a8acb221714
https://hg.mozilla.org/mozilla-central/rev/9b7184f6e154
https://hg.mozilla.org/mozilla-central/rev/64ea34bc583e
https://hg.mozilla.org/mozilla-central/rev/9544f94ccdb8
https://hg.mozilla.org/mozilla-central/rev/d4a8c7efef04
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 36•12 years ago
|
||
Comment on attachment 659358 [details] [diff] [review]
Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
Review of attachment 659358 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +33,5 @@
> +
> + // detect if the Advanced Power Management feature is supported
> + __cpuid(regs, 0x80000000);
> + if (regs[0] < 0x80000007)
> + return false;
This sure is indented strangely.
Assignee | ||
Comment 37•12 years ago
|
||
Comment on attachment 659359 [details] [diff] [review]
Part 2: Avoid locking to store the computed result in the global variable in CalibratedPerformanceCounter
(In reply to :Ms2ger from comment #36)
> Comment on attachment 659358 [details] [diff] [review]
> Part 1: Use QueryPerformanceCounter directly if the machine has a stable TSC
>
> Review of attachment 659358 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: xpcom/ds/TimeStamp_windows.cpp
> @@ +33,5 @@
> > +
> > + // detect if the Advanced Power Management feature is supported
> > + __cpuid(regs, 0x80000000);
> > + if (regs[0] < 0x80000007)
> > + return false;
>
> This sure is indented strangely.
> // detect if the Advanced Power Management feature is supported
> __cpuid(regs, 0x80000000);
> if (regs[0] < 0x80000007)
>- return false;
>+ return false;
:-)
Comment 38•12 years ago
|
||
Comment on attachment 659359 [details] [diff] [review]
Part 2: Avoid locking to store the computed result in the global variable in CalibratedPerformanceCounter
Review of attachment 659359 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +598,2 @@
>
> + return AtomicStoreIfGreaterThan(&sLastResult, result);
For original QPC integration and calibration I had a code that was only using Interlock* functions. It was horribly slow, since Interlock is slow the same way as critical section entering.
Now you have critical section AND interlock, it is imo slower now.
I know lock is later removed at all, but in cost of adding just more Interlocking*...
Comment 39•12 years ago
|
||
Comment on attachment 659360 [details] [diff] [review]
Part 5: Change the implementation of GetTickCount64Fallback so that it never locks
Review of attachment 659360 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +434,2 @@
>
> + return newValue;
I had a similar (maybe better ;)) code that was using Interlocks* but that was much much slower then when I had just a single lock. So I abandoned this approach.
So, IMO this is regressing performance and may also bring some gtc/qpc distance issues that are critical to checking QPC jitter...
Comment 40•12 years ago
|
||
Comment on attachment 654862 [details] [diff] [review]
Part 6: Remove the need for locking in all calls to TimeStamp::Now
Review of attachment 654862 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/ds/TimeStamp_windows.cpp
@@ +199,5 @@
> + bool fallBackToGTC;
> + bool forceRecalibrate;
> + } flags;
> + uint32_t dwordValue;
> +} sCalibrationFlags;
Why then there aren't just two uint32_t global vars rather?
@@ +507,1 @@
> RecordFlaw();
RecordFlaw does sCalibrationFlags.flags.fallBackToGTC = true. That can be turned to 32 bit write that may be freely concurent since we only switch to |true|.
Hence I think you don't need lock here.
@@ +519,2 @@
>
> + sCalibrationFlags.flags.forceRecalibrate = false;
Why are you actually locking here?
@@ +574,1 @@
> LONGLONG diff = qpc - ms2mt(gtc) - sSkew;
Read access to 64bit sSkew is atomic?
You need to log in
before you can comment on or make changes to this bug.
Description
•