Open Bug 1515155 Opened 7 years ago Updated 7 months ago

Restrict the readtsc instability fix deployed in bug 1487778 only to CPUs in family 21 model 2

Categories

(Core :: XPCOM, enhancement)

enhancement

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [sp3])

In bug 1487778, we deployed a broad fix for the readtsc instability that we observed as a result of an assertion failure. See bug 1487778 comment 29. This bug tracks redoing that fix to make it more specific to this family type.
Blocks: 1487778
Tentatively assigning to me.
Assignee: nobody → honzab.moz

I apparently can't get to this, unblocking.

Assignee: honzab.moz → nobody
Severity: normal → S3
Whiteboard: [sp3]
Blocks: 1863455

This bug came up in the perf-sp3 channel on Matrix as having shown up in a profile of JetStream 3.

Rather trying to update our mitigations to more specifically target what we think were the affected platforms, I would propose that we instead simplify and just use QPC (ie no critical section in our code, no use of GetTickCount(), no logic to try and detect instability, no dual time stamp representations).

According to Microsoft's documentation, we should neither be checking what kind of hardware timing capabilities are present nor worrying about QPC() being non-monotonic and those are specifically what our additional code has been doing.

Microsoft also indicates that they've invested in getting QPC() right. They will also continue to make improvements as new hardware/Windows versions come out. We should avoid duplicating their work without a good reason.

10+ years QPC() was known to have issues and so that's probably why we added these mitigations in the first place. These complexities resulted in unexpected side effects and I suspect that what I reported in this comment may have been the final cause behind bug 1487778.

At any rate, I think we should be try to cut back and simplify rather than add complexity in this case. We could probably let it ride in Nightly for a bit first to see if any problems show up.

TLDR: Rather than add another layer of complexity to our time stamp implementation, we should just be using QPC() and let it do its job.

Here's cpuinfo from a family 21 model 2 CPU from https://reviews.llvm.org/D46314?id=144750:


Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          21
Model:               2
Model name:          AMD FX(tm)-8350 Eight-Core Processor
Stepping:            0
CPU MHz:             3584.018
CPU max MHz:         4000.0000
CPU min MHz:         1400.0000
BogoMIPS:            8027.22
Virtualization:      AMD-V
L1d cache:           16K
L1i cache:           64K
L2 cache:            2048K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

It has constant_tsc and nonstop_tsc which suggests to me that we're going to have HasStableTSC() = true on this CPU

I think this also means that Chrome would've been seeing backwards timestamps from QPC on these CPUs.

See Also: → 1969719
You need to log in before you can comment on or make changes to this bug.