Open Bug 1881546 Opened 3 months ago Updated 3 months ago

Show an indicator on signature summaries if associated crash reports access guard pages

Categories

(Socorro :: General, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: afranchuk, Unassigned)

References

Details

Heuristics around when to show the indicator are to be determined. Such an indicator will assist in quick triage of security-sensitive bugs (as accessing guard pages is often a sign of buffer overflow). The indicator should be protected (only shown to authorized users) as it has security implications.

I've done a bit of exploration here.

  • Out of a sampling of 1000 Win11 crashes, 5 had is_likely_guard_page set. 1000 isn't a large sample size, and there's event bias based on whether our code in recent releases actually has any buffer overflow or other guard page access.
  • Of those 5, 4 also reported potential bit flips.
  • I chose one to look at more closely (d8c28511-3ef2-4509-b37c-96b270240222). The related signature has many crashes reporting is_likely_guard_page (62/85). One even has an EXCEPTION_GUARD_PAGE reason. Those that don't have it are either 32-bit (which doesn't support further analysis) or are crashing on a different instruction (otherwise they are all crashing on the same instruction). Many also show potential bit flips for that address. All of the is_likely_guard_page crashes are crashing on the first byte of a page.

The likelihood of a true guard page access (as a result of buffer overflow) also being reported with potential bit flips is hard to determine since it comes down to the memory mapping of the process, however guard pages are often in heap memory, which one can argue would have more memory mapped in the region (so the bit flip detection may trigger on it and find matching bit-flipped mapped memory more frequently than elsewhere).

On the other hand, the likelihood of a true bit flip causing a guard page access (to a guard page following a group of mapped pages) is (assuming a uniform probability distribution of the bit being flipped) not terribly likely, especially given that we only consider mapped memory with a fairly narrow size range as potential guard pages. That is to say, if the cause of a bug is hardware failure, you'd expect to see plenty of crashes with bit flips which don't report is_likely_guard_page. You'd also not expect to see the crashing address as the first byte (or for that matter, a similar offset across all crashes) in a page, as is the case of the signature I linked previously.

Given this information, I think we could use a heuristic along the lines of 50% of crashes have is_likely_guard_page set for some memory access. After that, a developer can get higher (or lower) confidence by further inspection. E.g. all instructions being the same and all addresses being the same offset into a page are red flags. I would have suggested a higher threshold but there seems to be a decent bit of noise in the crash reports themselves, at least in the example I inspected. This may be due to bad hardware (sadly that probably introduces noise across all of our signatures, proportional to CPU time in the relevant code) or perhaps another bug in the same function (which is less likely but still possible). If we want to account for the potential of 2 bugs in the same signature (assuming equal incidence, which is a big assumption) plus the noise of bad hardware, we might consider reducing that threshold to be a bit lower. But the point of this indicator is to give at-a-glance hints, so I don't think it's necessary to do that.

See Also: → 1883067
You need to log in before you can comment on or make changes to this bug.