Closed Bug 1345939 Opened 7 years ago Closed 7 years ago

javascript: performance.now malfunctions / returns bad values, on AMD Phenom II

Categories

(Core :: XPCOM, defect)

52 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr45 --- wontfix
firefox-esr52 --- affected
firefox54 - wontfix
firefox55 + wontfix
firefox56 + fixed

People

(Reporter: somnia, Assigned: mayhemer)

Details

Attachments

(2 files, 1 obsolete file)

Attached file jsbug.htm
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20161215004017

Steps to reproduce:

Run a short javascript program. File attached.
Wait for about 2 minutes. The text "pos" turns into "error:NEG".

Note: Wrong result ONLY on Windows 7 + Machine A (Phenom II 955, AMD 770). 
On two other machines (Intel) + Win7, results are OK. 
On Linux + Machine A, OK.
On Google Chrome + Win7 + Machine A, results OK.

On Firefox 52 + Win7 + Machine A, FAIL.
On Firefox Beta 53.0b1 64-bit + Win7 + Machine A, FAIL.
On Firefox DE 52.0a2 + Win7 + Machine A, FAIL.

Note: I have noticed this bug earlier, it is there for at least a year, if not much longer.


Actual results:

Wrong result of execution.

Wait for about 2 minutes. The text "pos" turns into "error:NEG" (because performance.now() returned a negative number). Afterwards, the number "tm" is 16 most of the time, indicating that performance.now() downgraded to video refresh instead of High Performance Timer.


Expected results:

The text "pos" should have remained visible. "error:NEG" should not be displayed. 
The number "tm" should be oscillating, usually around 4 (ms) on desktops. There is no reason for the number "tm" to be 16 (ms) if High Perfomance Timer is available.
Note: to reproduce the bug, restart Firefox before running the attached file. Otherwise, "tm" might already be 16 on start and the bug won't show itself.
Correction: performance.now() doesn't return a negative number. What the program is detecting is when performance.now() goes backward in time.
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Honza, do you know what might be going on with the timers on this processor?
Flags: needinfo?(honzab.moz)
Component: JavaScript Engine → DOM
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #3)
> Honza, do you know what might be going on with the timers on this processor?

What is the underlying implementation of performance.now()?
Flags: needinfo?(honzab.moz) → needinfo?(bkelly)
Note that TimeStamp::Now() is not protected on windows to go backward.  Should be added at [1] when TSC is not stable.

[1] https://dxr.mozilla.org/mozilla-central/rev/34585620e529614c79ecc007705646de748e592d/mozglue/misc/TimeStamp_windows.cpp#199
TimeStamp::Now() and compared to a timestamp taken at start of navigation:

https://dxr.mozilla.org/mozilla-central/source/dom/performance/Performance.cpp#146

The creation timestamp comes from here for a window:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDOMNavigationTiming.cpp#77
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #6)
> TimeStamp::Now() and compared to a timestamp taken at start of navigation:
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/performance/Performance.
> cpp#146
> 
> The creation timestamp comes from here for a window:
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/base/
> nsDOMNavigationTiming.cpp#77

Thanks.  I'll fix it in the windows impl for timestamp (comment 5).
Assignee: nobody → honzab.moz
Status: UNCONFIRMED → ASSIGNED
Component: DOM → XPCOM
Ever confirmed: true
Attached patch v1 (obsolete) — Splinter Review
GetTickCount(64) is protected against going back when used on the same thread.  QueryPerfomanceCounter (on non-stable TSC) can jitter (and go back) and for some small amounts we ignore that jitter.  

This patch adds a simple |return lock:last = max(last, now())|-like atomic protection.

Note that sTimeStampLock is a fast spin lock, so it's presumed to be cheap.  Also, there is not much hardware with non-stable TSC these days.
Attachment #8873091 - Flags: review?(nfroyd)
Comment on attachment 8873091 [details] [diff] [review]
v1

Review of attachment 8873091 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is the correct fix, but I am happy to discuss.  Perhaps it'd be worthwhile to do anyway in combination with the new check suggested below?

::: mozglue/misc/TimeStamp_windows.cpp
@@ +199,5 @@
>    LARGE_INTEGER pc;
>    ::QueryPerformanceCounter(&pc);
> +
> +  if (!sHasStableTSC) {
> +    // This is a simple go-backward protection for faulty hardware

It's not clear to me that we're dealing with faulty hardware here.  The HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the problem only happens on an AMD CPU.  Perhaps the check in HasStableTSC should be made more robust instead?  (And we might realize speedups on non-Intel CPUs too?)
Attachment #8873091 - Flags: review?(nfroyd) → review-
(In reply to Nathan Froyd [:froydnj] from comment #9)
> It's not clear to me that we're dealing with faulty hardware here.  The
> HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the
> problem only happens on an AMD CPU.  Perhaps the check in HasStableTSC
> should be made more robust instead?  (And we might realize speedups on
> non-Intel CPUs too?)

I must have misread that function.  I though that we return true for all Intels and check APM for non-Intels.  But it's opposite.

According [1] AMDs advert APM too, so no reason to claim "all AMDs don't have stable TSC, ever" IMO.

Still I think the don't-go-back logic is something we should enforce on non-APM systems.

[1] https://en.wikipedia.org/wiki/Time_Stamp_Counter#Implementation_in_various_processors
Flags: needinfo?(nfroyd)
(In reply to Honza Bambas (:mayhemer) from comment #10)
> (In reply to Nathan Froyd [:froydnj] from comment #9)
> > It's not clear to me that we're dealing with faulty hardware here.  The
> > HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the
> > problem only happens on an AMD CPU.  Perhaps the check in HasStableTSC
> > should be made more robust instead?  (And we might realize speedups on
> > non-Intel CPUs too?)
> 
> I must have misread that function.  I though that we return true for all
> Intels and check APM for non-Intels.  But it's opposite.

...where "opposite" here means "we return false for all non-Intel chips" and check APM for Intel, correct?  Just verifying we're on the same page here.

> According [1] AMDs advert APM too, so no reason to claim "all AMDs don't
> have stable TSC, ever" IMO.

I am confused by this statement.  I agree that we shouldn't just blindly assume that all AMD processors have a stable TSC, merely that we should check APM support *and* stable TSC support on at least AMD processors in addition to doing so on Intel chips.  So the logic would become:

  // Get cpuid CPU string
  ...
  if (!intel && !amd) {
    // assume stable TSC is not supported
    return false;
  }

  // Check APM/stable TSC bit
  ...

For AMD chips that support stable TSC values, this would also increase performance, which seems valuable.

> Still I think the don't-go-back logic is something we should enforce on
> non-APM systems.

I think that's probably valuable.  But we can do both in this bug, yes?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #11)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > (In reply to Nathan Froyd [:froydnj] from comment #9)
> > > It's not clear to me that we're dealing with faulty hardware here.  The
> > > HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the
> > > problem only happens on an AMD CPU.  Perhaps the check in HasStableTSC
> > > should be made more robust instead?  (And we might realize speedups on
> > > non-Intel CPUs too?)
> > 
> > I must have misread that function.  I though that we return true for all
> > Intels and check APM for non-Intels.  But it's opposite.
> 
> ...where "opposite" here means "we return false for all non-Intel chips" and
> check APM for Intel, correct?  Just verifying we're on the same page here.

Yes.

> 
> > According [1] AMDs advert APM too, so no reason to claim "all AMDs don't
> > have stable TSC, ever" IMO.
> 
> I am confused by this statement.  I agree that we shouldn't just blindly
> assume that all AMD processors have a stable TSC, merely that we should
> check APM support *and* stable TSC support on at least AMD processors in
> addition to doing so on Intel chips.  So the logic would become:

yes!

> > Still I think the don't-go-back logic is something we should enforce on
> > non-APM systems.
> 
> I think that's probably valuable.  But we can do both in this bug, yes?

yes, we can.
Attached patch v2Splinter Review
- hasStableTSC returns:
  - false for non-intel and non-AMD processors
  - false when there is no way to detect Advanced Power Management
  - false when APM claims the TSC is not constant-rate
  - true otherwise
- when we run on a hardware w/o a stable TSC QueryPerformanceCounter is protected against going backwards (I don't think we need to do this when TSC is advertised stable)


https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea8b9d8c485615280318e1e4b8436951323e807e
Attachment #8873091 - Attachment is obsolete: true
Attachment #8877125 - Flags: review?(nfroyd)
Comment on attachment 8877125 [details] [diff] [review]
v2

Review of attachment 8877125 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thank you!
Attachment #8877125 - Flags: review?(nfroyd) → review+
somnia, could you please verify if this this fixed the problem?  There are builds at https://archive.mozilla.org/pub/firefox/try-builds/honzab.moz@firemni.cz-ea8b9d8c485615280318e1e4b8436951323e807e/

thanks.
Flags: needinfo?(somnia)
[Tracking Requested - why for this release]: TimeStamps are pretty fundamental to a lot of things we do inside Gecko, and making them more robust seems reasonable.
Since 54 was released and this issue has been there for a while. We may not take this in 54 if we end up having 54 dot release.
Going to land w/o original problem verification.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4db2da03c5fb
Protect TimeStamp::Now() from going backwards on hardware with non-stable TSC. r=nfroyd
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/4db2da03c5fb
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Time going backwards sounds bad.  Honza, do you want this to bake in nightly for a bit or should we get it in beta asap?
Flags: needinfo?(honzab.moz)
I'd rather let this bake on m-c.  It's a sensitive change.  When something goes wrong, it's hard to identify that timestamp is the cause.  Also, we lived with this for number of releases ;)
Flags: needinfo?(honzab.moz)
Alright, wontfix for 55, thanks!
Flags: needinfo?(somnia)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: