Open Bug 1523278 Opened 5 years ago Updated 3 months ago

Support PHC reports in Socorro

Categories

(Socorro :: General, task, P2)

Desktop
All

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: decoder, Assigned: willkg)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Once we have GWP-ASan support (see bug 1523268 for description) in Firefox, we will likely submit crash reports to Socorro that have a special flag set and contain more than one backtrace for the crashing thread (and potentially other debug information).

It is not yet clear how this information will look like and will know more once the client implementation is done. Filing this now as a placeholder so we bugs up for all the tasks to be done.

As for priorities: The ability to find these reports in Socorro (i.e. filter/search for them) will be P1 while anything else (like displaying the additional information, etc.) will be lower priority P3. This is to ensure that we get the project up and running as quickly and as parallel as possible.

willkg: here's what the annotations added to the crash report look like.

PHCKind=FreedPage
PHCUsableSize=8
PHCBaseAddress=139870055796736
PHCAllocStack=94444341547431,139869878357019,139869878749633,139869878793643,139869878779846,139869878804361,139869878283200,13986987836706,139869878207449,139869889700150,139869889701146,139869889877080,139869889753769,139869927105465,139869878781930,139869879004190
PHCFreeStack=94444341549027,139869878357140,139869878749633,139869878793643,139869878779846,139869878804361,139869878283200,13986987836706,139869878207449,139869889700150,139869889701146,139869889877080,139869889753769,139869927105465,139869878781930,139869879004190

The first two are straightforward and can be shown in crash-stats without any transformation.

PHCBaseAddress is an address in decimal form. It'll need to be shown in crash-stats as hexadecimal. I figure that's probably easy.

PHCAllocStack and PHCFreeStack are the more challenging ones. Each one is a stack trace, represented as a comma-separated list of up to 16 decimal addresses, which will need to be symbolicated. The stack traces can come from the parent process or any content process.

IIRC when we discussed this earlier you said this would be a reasonable way to represent the stack traces, but I figure I'd double-check now that I have real sample data. Will Socorro be able to symbolicate these stack traces? Are there any changes I can make to simplify things on the Socorro side?

Flags: needinfo?(willkg)

I'm really sorry I haven't been able to get to this, yet. Figuring out the answer to "Will Socorro be able to symbolicate these stack traces?" is in my list of things to figure out.

Making this a P2 so it's on my radar.

Priority: -- → P2

We talked about this yesterday. Nathan said it's definitely doable using the information in the minidump to convert the memory addresses to (offset, module, debugid) and then symbolicate that.

I can add the bits we need to surface this information in the Crash Stats interface. I'll try to get to that soon, but it'll probably be in the next couple of weeks.

I don't think I can add symbolication code for a while. Maybe late 2019q3. If it turns out that it's really important/meaningful, I can try to reprioritize.

Are any of these considered PII? Stacks usually aren't. I don't think the other values look PII-ish, but figured I'd ask.

Is any of this information security-sensitive? If it was public, does that inform naughty people in ways we're not thrilled about?

Flags: needinfo?(willkg) → needinfo?(n.nethercote)

None of the fields are PII, but the some or all of the fields are security-sensitive -- when PHC fields are present they will be pointing right at use-after-free bugs.

Flags: needinfo?(n.nethercote)

Another question (I can't remember if we discussed this at Whistler): if we get crash reports with the PHC annotations present (as described in comment 1) is there a way to identify and find those crash reports in crash-stats, or otherwise? Because the first thing we'll want to know once we deploy PHC is whether we are actually getting any crash reports with PHC annotations in them.

In other words, I want an operation "show me all the crash reports that contain a PHCKind annotation". Is that possible? Thanks.

Flags: needinfo?(willkg)

I have written a proof-of-concept Python script that can symbolize PHC traces given the PHC extra data and the full symbols zip file for the respective build. It needs a bit more work but it might be a viable interim solution until we have fully symbolized data in Socorro.

Summary: Support GWP-ASan reports in Socorro → Support PHC reports in Socorro

re: comment #6, yes--we can make PHCKind available for search. As I understand it, you want something like the following:

  • show me all crash reports that have a PHCKind field
  • show me all crash reports where PHCKind field has value xyz

Nicholas: Is that right? Are there other variations?

Also, grabbing this to do that part of the bug this week.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Flags: needinfo?(willkg)
Flags: needinfo?(n.nethercote)

willkg: What you suggest would suffice, but we might as well make all five PHC annotations (PHCKind, PHCUsableSize, PHCBaseAddress, PHCAllocStack, PHCFreeStack) searchable, it seems like that wouldn't be much harder.

So, here's what I really want:

  • All five annotations are security-sensitive, and should not be visible/searchable unless you have minidump access.
  • For those with minidump access, all five annotations should be searchable in the normal way.
    • For PHCUsableSize that would be the usual integer search operators ("has terms", "<", ">", etc.).
    • For the other four annotations it would be the usual string search operators ("has terms", "contains", "is", "exists", etc.).
  • For those with minidump access, all five annotations should be shown in the "details" tab if they are present, but they should not be shown if they are absent, because most of the time they will be absent.

I have some familiarity with Socorro's code so I am happy to look over any patch you write. Thanks!

Flags: needinfo?(n.nethercote)

So, to sum up, I'm planning to do the following today:

  • PHCBaseAddress is an address in decimal form which needs to be converted to hex
  • make PHCKind, PHCUsableSize, PHCBaseAddress, PHCAllocStack, PHCFreeStack searchable per the requirements in comment #9

The rest I'm pushing off to another day.

Nicholas: Do you have a crash report I can use to test with?

Flags: needinfo?(n.nethercote)

John pointed out Christian mentioned this crash report:

bp-256d20c8-457f-4b78-8213-c71580190721

I'll use that.

Flags: needinfo?(n.nethercote)

One other thing: if the stacks can be symbolicated, that would be great. decoder has a prototype script that can symbolicate, but having Socorro do that automatically would make things simpler. (But the search functionality is more important.)

willkg merged PR #4989: "bug 1523278: add PHC field processing and searchability" in 7310df5.

This implements the things I can do now.

Symbolicating the PHCAllocStack and PHCFreeStack is more complicated. We could implement symbolication code, but I'd want to integrate it with minidump-stackwalk since that's downloading and caching symbols files already and we don't want this to impact processor performance and runtime.

Another way to do this is to convert the list of addresses into a form that the Tecken symbolication API uses and then do an API request. That might be easier to implement.

https://tecken.readthedocs.io/en/latest/symbolication.html

Thank you, willkg. I just tried to search for PHC annotations on crash-stats but was unable to, so I guess this hasn't been deployed yet. (I checked both the mozilla.com and allizom.org instances, and the deployment site is broken.)

We don't use the deployment site. This code is on stage, but not prod. I don't mark bugs FIXED until they're on prod. Even so, we need a new index, so this code won't work on either stage or prod until Sunday when a new index is created with the mapping changes.

Hope that helps!

This is on stage and prod now. We have to wait until 00:00 UTC on Sunday at which point Socorro will create a new Elasticsearch index with the new mappings which includes the phc_* fields. Until then, those fields are getting added by processor rules, but it's undefined as to whether they're searchable in any helpful way.

Keeping this open to verify everything is working correctly on Monday and also to symbolicate the PHCAllocField and PHCFreeField.

willkg: will all submitted crash reports with PHC fields be visible? Or will it be limited to those submitted after this change landed, or after the new index is created?

Flags: needinfo?(willkg)

We have to wait until 00:00 UTC on Sunday at which point Socorro will create a new Elasticsearch index with the new mappings which includes the phc_* fields. Until then, phc fields will be visible in the report view, but they probably won't be searchable in the way you'd like. It's undefined what searching for crash reports collected this week will do.

Flags: needinfo?(willkg)

I messed up. Turns out the indexes get created at 00:00 Monday UTC, so they've only been up for a while.

Looks like no crash reports have been collected and processed since yesterday with a PHCKind field:

https://crash-stats.mozilla.org/search/?phc_kind=%21__null__&product=Firefox&date=%3E%3D2019-07-22T13%3A39%3A00.000Z&date=%3C2019-07-29T13%3A39%3A00.000Z&_facets=signature&page=1&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform

Can one of you submit a new crash report with PHCKind and friends in it? Then we can verify this is working.

Flags: needinfo?(n.nethercote)
Flags: needinfo?(choller)

Here's one I cooked up:
https://crash-stats.mozilla.com/report/index/30278c36-0e15-4719-a672-ca6d40190729

We also have one non-fake one! Which I won't link to here, but those with sufficient privileges can find it easily by searching for crashes where the "phc kind" field exists.

Flags: needinfo?(n.nethercote)
Flags: needinfo?(choller)

Unassigning myself since I'm not going to get to this any time soon.

Assignee: willkg → nobody
Status: ASSIGNED → NEW

Grabbing this one to look at soon.

Assignee: nobody → willkg
Status: NEW → ASSIGNED

Summarizing an email Paul sent out earlier and a couple of conversations I had:

  1. Currently Mozilla has PHC rolled out to 1% of release population and we're intending to increase that to 10% in January. PHC has uncovered several memory buffer overruns and use-after-frees already.
  2. Writing bug reports regarding PHC stacks is currently a manual process since stacks need to be symbolicated before we can write up bug reports for them.
  3. This isn't going to scale as we increase the population to 10%. We need to automate reporting bugs.
  4. To include PHC bug reporting in bugbot (which automates everything), we need the stacks to be symbolicated and those symbolicated stacks available to bugbot.
  5. This effort works towards two tentative 2024 OKRs around security and stability for Firefox.

I'll try to work on this early next year once the change freeze is over.

I'm pretty sure we can use the information from the stackwalker output coupled with the PHC annotations to generate symbolication API requests and use Eliot to symbolicate the stacks. I'll start with trying to get that working.

After I make sure that will work, I'll talk with Suhaib to make sure the changes we're making generate the data he needs for bugbot to do its thing.

I've got a proof-of-concept for symbolicating the addresses. The processed crash currently has phc_alloc_stack and phc_free_stack fields both of which contain a comma-delimited list of memory addresses. I'll add a processor rule to symbolicate those and add phc_alloc_stack_symbolicated and phc_free_stack_symbolicated fields to the processed crash. Each field will be an array of frame structures like this:

{
    "frame": 0,
    "module": "xul.dll",
    "module_offset": "0x734e3c",
    "function": "mozilla::dom::JSActorManager::ReceiveRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::Maybe<mozilla::dom::ipc::StructuredCloneData>&&, mozilla::Maybe<mozilla::dom::ipc::StructuredCloneData>&&)",
    "function_offset": "0x7dc",
    "file": "hg:hg.mozilla.org/mozilla-central:dom/ipc/jsactor/JSActorManager.cpp:55c63a6c547f1fecd412505a064f21fd1e1ec48e",
    "line": 172
},

This will be available in the ProcessedCrash API on Crash Stats and because it's protected data, it will require an API token for a user that has protected data access to access.

Suhaib: Does that work for your bugbot needs?

Flags: needinfo?(smujahid)

Suhaib: Does that work for your bugbot needs?

Yes, it should be fine.

Flags: needinfo?(smujahid)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: