Closed Bug 1830954 Opened 1 year ago Closed 1 month ago

Expose crashes which were likely accessing a guard page.

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: afranchuk, Assigned: willkg)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

In bug 1663983, minidump-stackwalk was improved to flag memory accesses which were likely accessing a guard page. It may be useful to use this new information to flag crashes as having higher security implications (as guard page access may imply a buffer overflow, for example). This information isn't always available: currently it only works for architectures which we disassemble, which is just x86_64.

Which version of minidump-stackwalk has this improvement?

Flags: needinfo?(afranchuk)

No version has been tagged yet with this improvement, so no immediate action can be taken. I believe the next version will be 0.17.0 due to the reorganization of the crates in the repository, so when that is tagged that will be the minimum required version for this bug to proceed.

Flags: needinfo?(afranchuk)

Version 0.17 was released a while ago with this feature. It is now possible to determine whether a memory access was hitting a likely guard page: https://github.com/rust-minidump/rust-minidump/blob/6714a70437db2789b5106395223a23d3fb0fdbed/minidump-processor/json-schema.md#L146

Will, this field is now in the released version. It would be useful to be able to search for crashes with it set as it may identify security bugs.

Flags: needinfo?(willkg)

I need to update minidump-stackwalk in Socorro.

is_likely_guard_page indicates security implications, so it needs to be a protected field.

A minidump output will have a memory_accesses field which is an array of structures each of which may have a is_likely_guard_page field. How do you want me to process that? Should we do something like:

  • guard_page = True iff at least one is_likely_guard_page is True
  • guard_page = False if there are no is_likely_guard_page fields or all of them are False

If not that, how do you want me to process this?

Do you want guard_page to show up in the Details page of the crash report view?

Flags: needinfo?(willkg)
Flags: needinfo?(gsvelto)
Flags: needinfo?(gpascutto)
Flags: needinfo?(afranchuk)
Depends on: 1862730

I think your suggestion for the guard_page value should be sufficient. If at least one memory access hits the guard page, that's an indication that the running instruction was interacting with that memory (which it definitely shouldn't be). There are contrived situations where is_likely_guard_page could be a false positive (hence the "likely"), but mapped memory without any r/w/x is uncommon for other purposes.

I think the Details page should be fine for this information. It would also be useful to search/filter on this (I'm not sure whether that's an extra development step or whether it comes for free if it's made into a Details entry).

Similar to bit-flips, is there some way we could lift this to the signature level as well? Is there already logic to have some protected information in signatures? Ultimately we want the triage of the signature as security-sensitive to be sped up, so having this information exposed there in some way would be helpful.

Flags: needinfo?(afranchuk)

I can do everything suggested in comments #5 and #6.

  1. guard_page (or some other name--TBD) is true iff there's at least one is_likely_guard_page = True
  2. guard_page will be marked as protected data
  3. I'll add a field to the Details tab in the report view for guard_page
  4. I'll make guard_page searchable and facetable

Regarding signatures, we should create a new bug for signature generation changes. My gut feeling is that since signatures are public we shouldn't put protected data information in a signature, but maybe there's a way to "generically" change the signature like with a generic prefix like "important" or "p1" or something like that. We would figure that out in the new bug.

I'm going to nix the needinfo for gsvelto and gcp. If you want to add something, please do, but I have the answers I need to proceed.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)
Flags: needinfo?(gpascutto)
Priority: -- → P2

I'm sorry, maybe I'm not using the correct terminology. I didn't meant to suggest that the signature itself contained this information, but instead that the summary information for a signature could indicate that there are crashes where a guard page was being accessed (aggregating the information from the crashes with that signature). In this case, I'm unsure of whether that should be done if any single crash accessed a guard page or whether a certain proportion should be needed to qualify. This is an imprecise indicator (like bit flips), but is meant to give a high-level indication/hint. In the case of security-sensitive things, getting a hint sooner rather than later to err on the cautious side is desirable.

Oh, I think I understand what you're asking for.

With the changes I'm proposing, users will be able to add is_likely_guard_page as an aggregation to a signature report. For example, the aggregations on this page:

https://crash-stats.mozilla.org/signature/?product=Firefox&date=%3E%3D2024-02-13T21%3A49%3A00.000Z&date=%3C2024-02-20T21%3A49%3A00.000Z&_sort=-date&signature=shutdownhang%20%7C%20std%3A%3Asys%3A%3Awindows%3A%3Alocks%3A%3Amutex%3A%3AMutex%3A%3Alock#aggregations

However, I think you're asking for an indicator on the signature report summary tab much like the ones we have for "potential startup crash". Is that right? If so, I'd need to know what labels are helpful and what heuristics to use for determining them.

Flags: needinfo?(afranchuk)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #9)

However, I think you're asking for an indicator on the signature report summary tab much like the ones we have for "potential startup crash". Is that right?

Yeah, we'd like the same thing for potential bit-flips too. In general it's very useful for this kind of signals where we know that only a certain amount of crashes will have a specific condition but we can infer something about the whole signature from it.

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #9)

If so, I'd need to know what labels are helpful and what heuristics to use for determining them.

This is definitely up for debate. On the one hand, it ought to be fairly uncommon for a guard page to be accessed by correct code (so it should be a high signal-to-noise indicator). And the implication of such an access being security-sensitive is also fairly important; it's not something we want to ignore. So I think the heuristic we use should be somewhat aggressive. I'd like to trawl through some data to see whether the assumed SNR is correct; it's possible that e.g. bit flips might affect this indicator more than I think. I'll report my findings and we can use that to inform the heuristic. It may end up being the most accurate as a composite of a few factors.

Flags: needinfo?(afranchuk)

How about we spin off the signature report indicator to a new bug. Then I'll do the work in this bug and we can see where things are at and it'll be easier to figure out a heuristic.

Does that sound ok?

Blocks: 1881546

Sounds good, I just created bug 1881546 for this purpose.

Example crash reports with is_likely_guard_page once we reprocess them with stackwalker v0.20.0:

  • 8dd52551-692d-45ff-82d0-9ab000240228
  • a5875ef2-5cc1-4fdd-bbab-0b1400240229
  • 7c66167c-de29-4b8f-85c3-1c7db0240228
  • 456b3a73-1607-4487-9814-929700240228
  • c52e0b31-5a34-4937-a922-b97c30240228

We talked on #crashreporting and agreed on using has_guard_page_access as the field name. Previously in this bug, I used guard_page and is_likely_guard_page.

This deployed to prod just now in bug #1886008.

The new field won't be searchable until next week after a new search index is built. Keeping this open until then.

I did some searches with it. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: