Open Bug 1258832 Opened 8 years ago Updated 2 years ago

check for DRAM errors on crash

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: sfink, Unassigned)

Details

Terrence and the rest of the GC team have long discussed the possibility of doing memtester-style checks for bad DRAM, in an attempt to deal with the long tail of idiosyncratic crashes. We really don't know how many of them could be due to bad RAM, but there's definitely a strong possibility that we (and many other teams) are wasting a substantial amount of time triaging such things, given the scary 8% bit error rate per DIMM stick per year that is quoted in the Google study, with a summary at http://www.zdnet.com/article/dram-error-rates-nightmare-on-dimm-street/

I see two main realizations of this, not mutually exclusive:

1) the crashreporter dialog box gives an option to do a memory scan (always, or just in a subset of cases that are more likely to be a result of hardware memory corruption). Optimally, clicking on that box would run a memory scan with elevated privileges so it can pin pages in physical RAM. It's ok if the scan takes a while.

2) when a crash occurs, a quick check for stuck bits (or similar problems) is run from within the original process's address space, covering easily-discoverable faulting memory. We'd hope to get the same physical pages, so we want to do it without perturbing things any more than necessary. The results of this scan would be added to the minidump extra info.
:bsmedberg, the GC team is pretty interested in having this functionality, but it's more of a breakpad thing and I believe breakpad-capable people are pretty swamped right now. What do you think of the value of this, and is there a way to resource it?
Flags: needinfo?(benjamin)
I don't think this is part of "breakpad" itself, but Firefox could do this as part of the larger crash reporting sequence. I'm happy to mentor, but this is something where you'd have to do the engineering work.
Flags: needinfo?(benjamin)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.