Closed
Bug 1345939
Opened 7 years ago
Closed 7 years ago
javascript: performance.now malfunctions / returns bad values, on AMD Phenom II
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: somnia, Assigned: mayhemer)
Details
Attachments
(2 files, 1 obsolete file)
658 bytes,
text/html
|
Details | |
2.14 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Build ID: 20161215004017 Steps to reproduce: Run a short javascript program. File attached. Wait for about 2 minutes. The text "pos" turns into "error:NEG". Note: Wrong result ONLY on Windows 7 + Machine A (Phenom II 955, AMD 770). On two other machines (Intel) + Win7, results are OK. On Linux + Machine A, OK. On Google Chrome + Win7 + Machine A, results OK. On Firefox 52 + Win7 + Machine A, FAIL. On Firefox Beta 53.0b1 64-bit + Win7 + Machine A, FAIL. On Firefox DE 52.0a2 + Win7 + Machine A, FAIL. Note: I have noticed this bug earlier, it is there for at least a year, if not much longer. Actual results: Wrong result of execution. Wait for about 2 minutes. The text "pos" turns into "error:NEG" (because performance.now() returned a negative number). Afterwards, the number "tm" is 16 most of the time, indicating that performance.now() downgraded to video refresh instead of High Performance Timer. Expected results: The text "pos" should have remained visible. "error:NEG" should not be displayed. The number "tm" should be oscillating, usually around 4 (ms) on desktops. There is no reason for the number "tm" to be 16 (ms) if High Perfomance Timer is available.
Note: to reproduce the bug, restart Firefox before running the attached file. Otherwise, "tm" might already be 16 on start and the bug won't show itself.
Correction: performance.now() doesn't return a negative number. What the program is detecting is when performance.now() goes backward in time.
Comment 3•7 years ago
|
||
Honza, do you know what might be going on with the timers on this processor?
Flags: needinfo?(honzab.moz)
Updated•7 years ago
|
Component: JavaScript Engine → DOM
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #3) > Honza, do you know what might be going on with the timers on this processor? What is the underlying implementation of performance.now()?
Flags: needinfo?(honzab.moz) → needinfo?(bkelly)
Assignee | ||
Comment 5•7 years ago
|
||
Note that TimeStamp::Now() is not protected on windows to go backward. Should be added at [1] when TSC is not stable. [1] https://dxr.mozilla.org/mozilla-central/rev/34585620e529614c79ecc007705646de748e592d/mozglue/misc/TimeStamp_windows.cpp#199
Comment 6•7 years ago
|
||
TimeStamp::Now() and compared to a timestamp taken at start of navigation: https://dxr.mozilla.org/mozilla-central/source/dom/performance/Performance.cpp#146 The creation timestamp comes from here for a window: https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDOMNavigationTiming.cpp#77
Flags: needinfo?(bkelly)
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #6) > TimeStamp::Now() and compared to a timestamp taken at start of navigation: > > https://dxr.mozilla.org/mozilla-central/source/dom/performance/Performance. > cpp#146 > > The creation timestamp comes from here for a window: > > https://dxr.mozilla.org/mozilla-central/source/dom/base/ > nsDOMNavigationTiming.cpp#77 Thanks. I'll fix it in the windows impl for timestamp (comment 5).
Assignee: nobody → honzab.moz
Status: UNCONFIRMED → ASSIGNED
Component: DOM → XPCOM
Ever confirmed: true
Assignee | ||
Comment 8•7 years ago
|
||
GetTickCount(64) is protected against going back when used on the same thread. QueryPerfomanceCounter (on non-stable TSC) can jitter (and go back) and for some small amounts we ignore that jitter. This patch adds a simple |return lock:last = max(last, now())|-like atomic protection. Note that sTimeStampLock is a fast spin lock, so it's presumed to be cheap. Also, there is not much hardware with non-stable TSC these days.
Attachment #8873091 -
Flags: review?(nfroyd)
Comment 9•7 years ago
|
||
Comment on attachment 8873091 [details] [diff] [review] v1 Review of attachment 8873091 [details] [diff] [review]: ----------------------------------------------------------------- I don't think this is the correct fix, but I am happy to discuss. Perhaps it'd be worthwhile to do anyway in combination with the new check suggested below? ::: mozglue/misc/TimeStamp_windows.cpp @@ +199,5 @@ > LARGE_INTEGER pc; > ::QueryPerformanceCounter(&pc); > + > + if (!sHasStableTSC) { > + // This is a simple go-backward protection for faulty hardware It's not clear to me that we're dealing with faulty hardware here. The HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the problem only happens on an AMD CPU. Perhaps the check in HasStableTSC should be made more robust instead? (And we might realize speedups on non-Intel CPUs too?)
Attachment #8873091 -
Flags: review?(nfroyd) → review-
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #9) > It's not clear to me that we're dealing with faulty hardware here. The > HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the > problem only happens on an AMD CPU. Perhaps the check in HasStableTSC > should be made more robust instead? (And we might realize speedups on > non-Intel CPUs too?) I must have misread that function. I though that we return true for all Intels and check APM for non-Intels. But it's opposite. According [1] AMDs advert APM too, so no reason to claim "all AMDs don't have stable TSC, ever" IMO. Still I think the don't-go-back logic is something we should enforce on non-APM systems. [1] https://en.wikipedia.org/wiki/Time_Stamp_Counter#Implementation_in_various_processors
Flags: needinfo?(nfroyd)
Comment 11•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10) > (In reply to Nathan Froyd [:froydnj] from comment #9) > > It's not clear to me that we're dealing with faulty hardware here. The > > HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the > > problem only happens on an AMD CPU. Perhaps the check in HasStableTSC > > should be made more robust instead? (And we might realize speedups on > > non-Intel CPUs too?) > > I must have misread that function. I though that we return true for all > Intels and check APM for non-Intels. But it's opposite. ...where "opposite" here means "we return false for all non-Intel chips" and check APM for Intel, correct? Just verifying we're on the same page here. > According [1] AMDs advert APM too, so no reason to claim "all AMDs don't > have stable TSC, ever" IMO. I am confused by this statement. I agree that we shouldn't just blindly assume that all AMD processors have a stable TSC, merely that we should check APM support *and* stable TSC support on at least AMD processors in addition to doing so on Intel chips. So the logic would become: // Get cpuid CPU string ... if (!intel && !amd) { // assume stable TSC is not supported return false; } // Check APM/stable TSC bit ... For AMD chips that support stable TSC values, this would also increase performance, which seems valuable. > Still I think the don't-go-back logic is something we should enforce on > non-APM systems. I think that's probably valuable. But we can do both in this bug, yes?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #11) > (In reply to Honza Bambas (:mayhemer) from comment #10) > > (In reply to Nathan Froyd [:froydnj] from comment #9) > > > It's not clear to me that we're dealing with faulty hardware here. The > > > HasStableTSC check returns false for non-Intel CPUs, and comment 0 says the > > > problem only happens on an AMD CPU. Perhaps the check in HasStableTSC > > > should be made more robust instead? (And we might realize speedups on > > > non-Intel CPUs too?) > > > > I must have misread that function. I though that we return true for all > > Intels and check APM for non-Intels. But it's opposite. > > ...where "opposite" here means "we return false for all non-Intel chips" and > check APM for Intel, correct? Just verifying we're on the same page here. Yes. > > > According [1] AMDs advert APM too, so no reason to claim "all AMDs don't > > have stable TSC, ever" IMO. > > I am confused by this statement. I agree that we shouldn't just blindly > assume that all AMD processors have a stable TSC, merely that we should > check APM support *and* stable TSC support on at least AMD processors in > addition to doing so on Intel chips. So the logic would become: yes! > > Still I think the don't-go-back logic is something we should enforce on > > non-APM systems. > > I think that's probably valuable. But we can do both in this bug, yes? yes, we can.
Assignee | ||
Comment 13•7 years ago
|
||
- hasStableTSC returns: - false for non-intel and non-AMD processors - false when there is no way to detect Advanced Power Management - false when APM claims the TSC is not constant-rate - true otherwise - when we run on a hardware w/o a stable TSC QueryPerformanceCounter is protected against going backwards (I don't think we need to do this when TSC is advertised stable) https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea8b9d8c485615280318e1e4b8436951323e807e
Attachment #8873091 -
Attachment is obsolete: true
Attachment #8877125 -
Flags: review?(nfroyd)
Comment 14•7 years ago
|
||
Comment on attachment 8877125 [details] [diff] [review] v2 Review of attachment 8877125 [details] [diff] [review]: ----------------------------------------------------------------- Looks great, thank you!
Attachment #8877125 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 15•7 years ago
|
||
somnia, could you please verify if this this fixed the problem? There are builds at https://archive.mozilla.org/pub/firefox/try-builds/honzab.moz@firemni.cz-ea8b9d8c485615280318e1e4b8436951323e807e/ thanks.
Flags: needinfo?(somnia)
Comment 16•7 years ago
|
||
[Tracking Requested - why for this release]: TimeStamps are pretty fundamental to a lot of things we do inside Gecko, and making them more robust seems reasonable.
status-firefox54:
--- → affected
status-firefox55:
--- → affected
status-firefox56:
--- → affected
status-firefox-esr45:
--- → wontfix
status-firefox-esr52:
--- → affected
tracking-firefox54:
--- → ?
tracking-firefox55:
--- → ?
tracking-firefox56:
--- → ?
Comment 17•7 years ago
|
||
Since 54 was released and this issue has been there for a while. We may not take this in 54 if we end up having 54 dot release.
Assignee | ||
Comment 18•7 years ago
|
||
Going to land w/o original problem verification.
Keywords: checkin-needed
Comment 19•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4db2da03c5fb Protect TimeStamp::Now() from going backwards on hardware with non-stable TSC. r=nfroyd
Keywords: checkin-needed
Comment 20•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4db2da03c5fb
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 21•7 years ago
|
||
Time going backwards sounds bad. Honza, do you want this to bake in nightly for a bit or should we get it in beta asap?
Assignee | ||
Comment 22•7 years ago
|
||
I'd rather let this bake on m-c. It's a sensitive change. When something goes wrong, it's hard to identify that timestamp is the cause. Also, we lived with this for number of releases ;)
Flags: needinfo?(honzab.moz)
Comment 23•7 years ago
|
||
Alright, wontfix for 55, thanks!
Assignee | ||
Updated•6 years ago
|
Flags: needinfo?(somnia)
You need to log in
before you can comment on or make changes to this bug.
Description
•