Open Bug 563082 Opened 14 years ago Updated 2 years ago

Consider switching back to QueryPerformanceCounter on Windows

Categories

(NSPR :: NSPR, defect)

x86
macOS
defect

Tracking

(Not tracked)

People

(Reporter: ehsan.akhgari, Unassigned)

References

Details

Bug 115865 switched from QueryPerformanceCounter to timeGetTime on Windows for getting NSPR intervals.

Although MSDN says that timeGetTime should have a resolution of 1ms, I've observed that it often has a resolution of 15ms, which is very bad if you need intervals of small amounts.

I think the original problem on multi-core machines that caused us to switch to timeGetTime can be solved by using SetThreadAffinityMask to make sure that the OS doesn't change the core we're running on until we're done grabbing the counter value.

Therefore I think we can consider switching back to QueryPerformanceCounter again.
I thought bug of HAL of Win(reason why changed to timeGetTime) was bypassed by bug 363258 successfully and QueryPerformanceCounter is already used again.
Do you still see 15ms resolution? On what kind of CPU? MS Win on Virtual Machine?
FYI.
Change from QueryPerformanceCounter to timeGetTime was done by bug 307527 instead of Bug 115865, to bypass bug of MS Win's HAL. (see 363258 comment #11, please)
failed to linkify.  (see bug 363258 comment #11, please)
(In reply to comment #1)
> I thought bug of HAL of Win(reason why changed to timeGetTime) was bypassed by
> bug 363258 successfully and QueryPerformanceCounter is already used again.
> Do you still see 15ms resolution? On what kind of CPU? MS Win on Virtual
> Machine?

I'm talking about the NSPR interval API:

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntinrval.c#71

I fixed the problem in my repository in these two changesets:

http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/rev/cf7b100f8b66
http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/rev/da1c9c08d470

I basically restored the old code before bug 307527.

(In reply to comment #2)
> FYI.
> Change from QueryPerformanceCounter to timeGetTime was done by bug 307527
> instead of Bug 115865, to bypass bug of MS Win's HAL. (see 363258 comment #11,
> please)

Hmm, right, I took the bug # from the wrong tab in comment 0.  :-)

But as far as I can tell, bug 363258 only fixed this for the JS time API, which is not what I'm using.
(In reply to comment #4)
> I basically restored the old code before bug 307527.

See Bug 363258 comment #36, please. This is the bug of HAL of MS Win and the reason why timeGetTime was used to avoid the bug.
Hot fix is already available, and some venders already shipped HAL update as "driver update of PC", and I believe newest PC with newest version/SP of new MS Win are shipped with the bug fixed.
However, there still are users who don't have the bug fix of HAL. Please note that applying of hot fix is impossible for general users. 

Is the bug of HAL of MS Win bypassed by your modification?
I said "bug of HAL of MS Win". But it is not accurate. Bug is: MS used TSC as system timer/counter for multi-cpu version of Win, without sufficient care for hardware spec/behaviour of CPU, ACPI especially when multi-cpu, even though single CPU version of MS Win uses Power Management Timer as system timer/counter. Other OS'es probably worked around problem by use of Power Managemet Timer as system timer. However, MS has to support direct RDTSC use instead of QueryPerformanceCounter call by user program such as game, video, audio. So MS's solution/answer was "correct scew of TSC in HAL for multi-cpu".
As written in bug 307527, /usepmtimer is workaround of problem if MS Win XP SP2 or later, and some general users can do it by himself. But we can't force it to all general users. And, we had to care for many XP SP1 users(/usepmtimer can't be used if SP1 or before) when bug 307527 was fixed.

Can we already ignore XP SP1? Can we guide users who will meet bug 307527 to use /usepmtimer?
(In reply to comment #5)
> (In reply to comment #4)
> > I basically restored the old code before bug 307527.
> 
> See Bug 363258 comment #36, please. This is the bug of HAL of MS Win and the
> reason why timeGetTime was used to avoid the bug.
> Hot fix is already available, and some venders already shipped HAL update as
> "driver update of PC", and I believe newest PC with newest version/SP of new MS
> Win are shipped with the bug fixed.
> However, there still are users who don't have the bug fix of HAL. Please note
> that applying of hot fix is impossible for general users. 

Do you know exactly when the fix to that was shipped?  Like, which service pack of XP does it come with?

> Is the bug of HAL of MS Win bypassed by your modification?

To the best of my understanding, the bug was a result of switching cores on which the code is running while it's executing QueryPerformanceCounter, right?  In that case, yes.

(In reply to comment #6)
> I said "bug of HAL of MS Win". But it is not accurate. Bug is: MS used TSC as
> system timer/counter for multi-cpu version of Win, without sufficient care for
> hardware spec/behaviour of CPU, ACPI especially when multi-cpu, even though
> single CPU version of MS Win uses Power Management Timer as system
> timer/counter. Other OS'es probably worked around problem by use of Power
> Managemet Timer as system timer. However, MS has to support direct RDTSC use
> instead of QueryPerformanceCounter call by user program such as game, video,
> audio. So MS's solution/answer was "correct scew of TSC in HAL for multi-cpu".
> As written in bug 307527, /usepmtimer is workaround of problem if MS Win XP SP2
> or later, and some general users can do it by himself. But we can't force it to
> all general users. And, we had to care for many XP SP1 users(/usepmtimer can't
> be used if SP1 or before) when bug 307527 was fixed.
> 
> Can we already ignore XP SP1? Can we guide users who will meet bug 307527 to
> use /usepmtimer?

Hmm, I think it's been a few years since Microsoft has stopped supporting XP and XP SP1.  I don't know what the NSPR policies are in that regard.  Is this something that we expect people to encounter with the latest OS updated applied?
(In reply to comment #7)
> Do you know exactly when the fix to that was shipped?

See next article which I pointed in bug 307527 #101, please.
> http://support.microsoft.com/kb/Q896256/?LN=en-us&x=20&y=22

> Like, which service pack of XP does it come with?

I don't know.
Some venders really had provided the HAL change as "driver update" at that time. So, I think newer PC was shipped with the HAL change included by PC venders.  
I use Core2Duo PC by HP shipped with XP SP2, and I already upgraded to SP3.
Do I need to check history of HAL for multi-cpu on my PC?

> Hmm, I think it's been a few years since Microsoft has stopped supporting XP and XP SP1.

If we can assume XP SP2 or newer, or we can say "XP or XP SP1 is not supported by Tb" to user, we can recommend /usepmtimer to user even if user will meat bug 307527 after change by this bug.

> To the best of my understanding, the bug was a result of switching cores
> on which the code is running while it's executing QueryPerformanceCounter, right? 

Yes, right.

> In that case, yes.

Does it mean that your patch sets CPU affinity for QueryPerformaceCounter call?
If so, I think it may produce other issue.
Real time interval for one TSC count difference depends on CPU's status by ACPI.
Supplied clock rate is reduced if CPU enters in slow down mode for power saving, but no application can know the clock rate change, no application can know real time interval for one TSC count difference when the CPU is in slow down mode.
I think it's one of may reasons why Wan-Teh Chang didn't select workaround of CPU affinity in solution for bug 307527 and why Rob Arnold didn't select workaround of CPU affinity in solution for bug 363258.
I think solution of TSC skew issue should be one of next instead of CPU affinity use by application.
  - Use Power Management Timer for timer/counter instead of TSC,
    as MS Win for single CPU does, as other OS'es does.
  - As MS does, modify TSC value if skew is detected, for RDTSC users
    such as QueryPerfomanceCounter on multi-CPU.
(In reply to comment #8)
> (In reply to comment #7)
> > Do you know exactly when the fix to that was shipped?
> 
> See next article which I pointed in bug 307527 #101, please.
> > http://support.microsoft.com/kb/Q896256/?LN=en-us&x=20&y=22

Hmm, a little digging around showed that this hotfix is included in SP3.

http://support.microsoft.com/kb/946480

> > Like, which service pack of XP does it come with?
> 
> I don't know.
> Some venders really had provided the HAL change as "driver update" at that
> time. So, I think newer PC was shipped with the HAL change included by PC
> venders.  
> I use Core2Duo PC by HP shipped with XP SP2, and I already upgraded to SP3.
> Do I need to check history of HAL for multi-cpu on my PC?

No, I don't think information from a single system is going to help us here much.

> > Hmm, I think it's been a few years since Microsoft has stopped supporting XP and XP SP1.
> 
> If we can assume XP SP2 or newer, or we can say "XP or XP SP1 is not supported
> by Tb" to user, we can recommend /usepmtimer to user even if user will meat bug
> 307527 after change by this bug.

Well, you indicated that even SP2 users may have problems here, and they should really switch to SP3, right?

> > To the best of my understanding, the bug was a result of switching cores
> > on which the code is running while it's executing QueryPerformanceCounter, right? 
> 
> Yes, right.
> 
> > In that case, yes.
> 
> Does it mean that your patch sets CPU affinity for QueryPerformaceCounter call?

Yes.  You can actually see the code here:

http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/file/tip/nsprpub/pr/src/md/windows/ntinrval.c

> If so, I think it may produce other issue.
> Real time interval for one TSC count difference depends on CPU's status by
> ACPI.
> Supplied clock rate is reduced if CPU enters in slow down mode for power
> saving, but no application can know the clock rate change, no application can
> know real time interval for one TSC count difference when the CPU is in slow
> down mode.
> I think it's one of may reasons why Wan-Teh Chang didn't select workaround of
> CPU affinity in solution for bug 307527 and why Rob Arnold didn't select
> workaround of CPU affinity in solution for bug 363258.
> I think solution of TSC skew issue should be one of next instead of CPU
> affinity use by application.
>   - Use Power Management Timer for timer/counter instead of TSC,
>     as MS Win for single CPU does, as other OS'es does.
>   - As MS does, modify TSC value if skew is detected, for RDTSC users
>     such as QueryPerfomanceCounter on multi-CPU.

Is there any way for us to detect this problem at runtime?
We need some kind of a fix for this bug in order to take the work done in bug 560647 on mozilla-central.
Blocks: 560647
(In reply to comment #9)
> Hmm, a little digging around showed that this hotfix is included in SP3.
> http://support.microsoft.com/kb/946480

You can probably use QueryPerformanceCounter on multi-cpu/acpi with no care for TSC skew issue now.
I think many of XP users already use SP3, and many of XP SP2 PC was shipped with the hotfix included by PC venders. If SP2 user doesn't have the HAL update yet, user can install it by upgrade to SP3 or by install of "driver update" which is already provided by many PC venders. And, even if SP2 user can't upgrade to SP3 yet for some reasons, there is effective/proper workaround of /usepmtimer.

> Well, you indicated that even SP2 users may have problems here,

Yes.

> and they should really switch to SP3, right?

Yes, but partially wrong. We can recommend SP3 to users, but can't force XP SP2 user to upgrade to SP3 due to problem of Tb's bug 307527. So, we need to provide information about workaround of /usepmtimer to such users.

> Is there any way for us to detect this problem at runtime?

AFAIK, answer is No. I think bug 363258 comment #19 by Rob Arnold is good document for you. It's a most important base of my comments. See some other comments by him for QueryPerformaneCounter use when TSC skew problem exists, please.
CCing Rob here to see if he has any ideas how to solve this problem.
TSC skew between two processors was only one of the issues. The other major one was that dynamic changes in clock frequency will mess up the results. These happen all the time on single processor systems due to changes in CPU workload.

I still have a system which used to exhibit the problem. I can check it again for errors but it was XP SP2 at the time which we still support (it is now XP SP3).

If a 1ms resolution from the nspr interval code is what you want, call timerBeginPeriod(1) early on in startup. This can take up to 15ms to have an effect on XP but it was "instant" on my Vista system in 2007. Be sure to call timerEndPeriod(1) when you are done measuring as it has reduces battery life by up to 25%. It may also interfere with your measurements more as it causes the kernel to be interrupted more frequently (though I suppose at startup we're making lots of system calls anyways).

For more detailed information, I'll shamelessly plug a blog post I wrote a while ago: http://robarnold.org/measuring-performance/
(In reply to comment #13)
> The other major one was that dynamic changes in clock frequency will mess up the results.
> These happen all the time on single processor systems due to changes in CPU workload.

Is it really true on real sigle processor system? Please note that Hypert-Threading of Intel is considered multi-cpu enen if physically single CPU.
AFAIK, MS Win XP of single CPU version used Power Management Timer(which is not affected by ACPI) instead of TSC, although MS Win XP of multi-CPU version(including for Hyper Threading of Intel) used/uses TSC instead of Power Management Timer, and Linux solution was/is use Power Management Timer instea of TSC. (sorry but I don't know about Mac OS X).
Is Power Management Timer affected by power saving of ACPI?  

> I can check it again for errors but it was XP SP2 at the time which we still support (it is now XP SP3).

XP SP2 only issue(TSC skew problem is not resolved by install of change of HAL for multi-cpu provided by hotfix or SP3 by MS, and workaround of /usepmtimer is not introruced yet user), isnt't it?

We are talking about QueryPerformanceCounter. Will /usepmtimer be proper solution for us of issue around ACPI on multi-CPU?
(In reply to comment #14)
> (In reply to comment #13)
> > The other major one was that dynamic changes in clock frequency will mess up the results.
> > These happen all the time on single processor systems due to changes in CPU workload.
> 
> Is it really true on real sigle processor system? Please note that
> Hypert-Threading of Intel is considered multi-cpu enen if physically single
> CPU.

Yes. My system is a single logical core Centrino @ 2 Ghz.

> AFAIK, MS Win XP of single CPU version used Power Management Timer(which is not
> affected by ACPI) instead of TSC, although MS Win XP of multi-CPU
> version(including for Hyper Threading of Intel) used/uses TSC instead of Power
> Management Timer, and Linux solution was/is use Power Management Timer instea
> of TSC. (sorry but I don't know about Mac OS X).
> Is Power Management Timer affected by power saving of ACPI?  

So we need to make another unfortunate distinction here. XP Home and XP Pro ship with different HALs. It's a little-remembered "feature" that XP Home only uses a single processor (physical, not logical if I remember correctly). They actually ship with different HALs so we'd need to test with XP Home on various machines to know if they have the bug. It's been rather difficult to find a XP Home system - hopefully QA has one.

I believe what you mean by Power Management Timer is a tiny off-chip device that does update at regular intervals while the machine is on. Suspend/Hibernate will mess with its values so I am reluctant to suggest it as a straightforward replacement for all users of NSPR's Interval module. However, in the use case of startup timeline, it's clearly acceptable.

My Vista machine used it, I am not sure if my XP machine has one. Do you know how common these are? Keep in mind that Firefox can still run on pretty old machines.

> > I can check it again for errors but it was XP SP2 at the time which we still support (it is now XP SP3).
> 
> XP SP2 only issue(TSC skew problem is not resolved by install of change of HAL
> for multi-cpu provided by hotfix or SP3 by MS, and workaround of /usepmtimer is
> not introruced yet user), isnt't it?

I am not sure what you're saying here.

> We are talking about QueryPerformanceCounter. Will /usepmtimer be proper
> solution for us of issue around ACPI on multi-CPU?

I don't think asking users to change their boot configuration is a proper solution.
(In reply to comment #15)
> I believe what you mean by Power Management Timer is a tiny off-chip device
> that does update at regular intervals while the machine is on.

Right.

Following is a PDF found by Google search for 'PC "Power Management Timer" TSC'.
> http://svn.wildfiregames.com/public/ps/trunk/docs/timing_pitfalls.pdf
> Timing Pitfalls and Solutions
The content is base source of my knowledge about PC timing hardware.
I saw completely same content as this document at some Web sites, in HTML, in text. The PDF looks for me original document of copies at some Web sites.  
Note: Content-Type: text/plain is returned. Save to disk, please.

Power Management Timer(PMT) was a part of ACPI standard. I think CPU which supports ACPI standard should have PMT, even if PMT can be equipped on CPU which doesn't support ACPI. Problem around TSC(variable CPU clock frequency, and skew due to it) occurs only when ACPI is supported and is enabled. And, it was merely "skew of TSC won't occur if single-cpu". 

"skew of TSC" is multi-cpu with ACPI only issue.
"variable CPU clock frequency" occurs on both single-cpu with ACPI and mullti-cpu with ACPI.
Problem around hybernation can occur on both CPU supports ACPI and CPU doesn't support ACPI.
Problem around "clock set to future or past by user while running" can occur on any CPU.

> > We are talking about QueryPerformanceCounter. Will /usepmtimer be proper
> > solution for us of issue around ACPI on multi-CPU?
> I don't think asking users to change their boot configuration is a proper
> solution.

I wanted to say "proper solution by *OS* is /usepmtimer", not by user nor by application.

Following is seen in PDF document I pointed at above. It may be a reason why /usepmtimer is not set as default even when PC of CPU with ACPI is shipped with XP SP2 or later.
> 4 Pitfalls 
> (snip)
> PMT Results are undefined if the timer is not polled at least once every
> 4.6 seconds9. This is caused by hardware bugs and/or incorrect software
> handling of counter overflow.
(In reply to comment #13)
> TSC skew between two processors was only one of the issues. The other major one
> was that dynamic changes in clock frequency will mess up the results. These
> happen all the time on single processor systems due to changes in CPU workload.
> 
> I still have a system which used to exhibit the problem. I can check it again
> for errors but it was XP SP2 at the time which we still support (it is now XP
> SP3).

It would be great if you can give that a shot.

> If a 1ms resolution from the nspr interval code is what you want, call
> timerBeginPeriod(1) early on in startup. This can take up to 15ms to have an
> effect on XP but it was "instant" on my Vista system in 2007. Be sure to call
> timerEndPeriod(1) when you are done measuring as it has reduces battery life by
> up to 25%. It may also interfere with your measurements more as it causes the
> kernel to be interrupted more frequently (though I suppose at startup we're
> making lots of system calls anyways).

I ended up using this workaround for now.

From the discussion, it seems to me that there is no good solution to this problem, unfortunately. :(
(In reply to comment #17)
> From the discussion, it seems to me that there is no good solution to this
> problem, unfortunately. :(

I think following documents are some of good documents to know about timer related issues and solutions.
(A) IBM developerWorks document (Date:  01 Apr 2002)
> http://www.ibm.com/developerworks/ibm/library/i-seconds/
> When microseconds matter
> I don't know it's for April Fools' Day or not :-)
(B) IBM developerWorks document (Date:  30 Mar 2010)
> http://www.ibm.com/developerworks/linux/library/l-timers-list/
> Kernel APIs, Part 3: Timers and lists in the 2.6 kernel
(C) A document linked in (B) (July 19th–22nd, 2006)
> http://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf
> Hrtimers and Beyond: Transforming the Linux Time Subsystems (PDF).
I've long stopped working on the project which required this change, but we can probably use a lot of the techniques which we have employed in http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp_windows.cpp for the Windows implementation of the PRTime API as well.

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.