Flag crash reports with stale microcode
Categories
(Socorro :: Webapp, enhancement)
Tracking
(Not tracked)
People
(Reporter: hsivonen, Unassigned)
References
(Depends on 1 open bug)
Details
Investigating crash reports that are due to CPU bugs are potential time sinks. On Windows, we already collect the microcode version (bug 1305120). However, the problem is more likely to occur on Linux (report collection: bug 1791726; back end: bug 1320921).
I suggest maintaining information about what the latest microcode version for each CPU model is (we might automatically infer this from crash reports that we get!) and clearly flagging crash reports that were generated on non-latest microcode for the given CPU.
Comment 1•2 years ago
|
||
We probably don't want to keep the list by hand, but here's a site that gathers them, maybe we could find a way to extract the list from it: https://github.com/platomav/CPUMicrocodes
Comment 2•2 years ago
|
||
There's also something worth mentioning about Windows: the registry surfaces both the current version of the microcode and whatever the CPU has booted up with. This allows to detect users that don't have BIOS updates applied but get their microcode from Windows. These might suffer from bugs induced by the fact that some ares of a CPU initialized by the boot microcode cannot be overridden later by a newer one. I don't know if the same info is also available on Linux. It's not in /proc/cpuinfo
so my guess is that it's likely not unless it can be found by parsing the output of dmesg
.
Reporter | ||
Comment 3•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #1)
We probably don't want to keep the list by hand, but here's a site that gathers them, maybe we could find a way to extract the list from it: https://github.com/platomav/CPUMicrocodes
As noted in comment 0, I think we could have the crash stats server infer the latest microcode for a given CPU info by looking across the other crash submissions that we get. (That probably needs some threshold of N distinct submitters having submitted a new microcode version for it to count so that some prankster can't just submit a single fake crash report to mess it up.)
Comment 4•2 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #3)
As noted in comment 0, I think we could have the crash stats server infer the latest microcode for a given CPU info by looking across the other crash submissions that we get. (That probably needs some threshold of N distinct submitters having submitted a new microcode version for it to count so that some prankster can't just submit a single fake crash report to mess it up.)
Ah yes, that's a way of doing it, though I don't know how much work that would require within Socorro.
Comment 5•2 years ago
|
||
My first-blush thoughts are that inferring the latest microcode version is probably doable and Crash Stats does something like this to figure out featured versions already. Once we have that information, using it to flag crash reports with old microcode is doable.
I don't know when I can get to this.
Do we have any idea on the impact for this? For example, if we added this flag, how many engineer hours would it save a month?
Description
•