Closed Bug 1626698 Opened 4 years ago Closed 2 years ago

schema for crash annotations in crash reports


(Socorro :: General, task, P2)


(Not tracked)



(Reporter: willkg, Assigned: willkg)


(Blocks 2 open bugs)



(1 file)

Socorro has a schema for crash report data to send to Telemetry. However, it has no schema for incoming crash reports to process.

This creates a series of problems:

  1. It's hard for people working on crash reporters to validate the crash report data they're sending. There's no schema, so they have nothing to validate against.
  2. Because of that, Socorro often gets crash reports that have different variations of values for annotations.
  3. Because of that, Socorro processor has to handle a lot of different kinds of junk data which causes a lot of Sentry errors, bugs in processing, and ongoing engineering time to fix.

For the last few years, we've talked about creating a schema for crash annotations and using that schema to validate crash data either in collection or processing.

This project covers:

  1. Creating the schema and tools to use
  2. Figuring out what to do with crash reports that aren't valid.
  3. Implement infrastructure for handling invalid crash report data and investigation of that data.

We've talked about this for years, but haven't done any serious work on it, yet.

Previously, there was no documentation for crash report annotations--it was a free-for-all. Thus building an accurate schema was a massive project. That's much easier now since Gabriele and crew have built and maintained a CrashAnnotations.yaml file:

There are annotations missing from that, but it's got a lot of stuff in it.

Another reason I haven't worked on this yet is that I have no idea what we should do with crash report data that doesn't validate. Should we reject it from collection? Should we collect it, but reject it from processing? How do we investigate validation problems? How does someone else find out about validation problems? Etc. All that needs to get figured out. There are lots of pros and cons for the various options.

We have to think about:

  • the needs of people who are writing crash reporters and need to test crash reporting
  • the needs of people who have submitted crash reports and want to share those crash reports, perhaps to get support
  • the fact that when Firefox and other products are crashing, they might be in an iffy state and may send data that's partially corrupted

Telemetry has infrastructure for invalid data. We should look at what they're doing and apply it here if we can.

Priority: -- → P3

One of the stakeholders should be Gabriele. We should either base our schema on the CrashAnnotations.yaml file or derive it from that file. If we can avoid creating something new or yet-another-thing that needs to be maintained, that'd be super.

I think the above makes this sound like it's one big project that has to be all-or-nothing. This should be broken down into smaller phases that are self-contained that move us forward.

I think the next step here would be to figure out the end state, then figure out small steps to get there and what needs to be implemented/changed at each step.

I started working on this in bug #1687987, where we're changing the collector to prefix fields it adds at the point of ingestion with "collector_" which differentiates them from crash annotations.

Making that bug block this one.

Depends on: 1687987
Summary: [tracker] schema for crash reports → [tracker] schema for crash annotations in crash reports
Depends on: 1453394

I don't want to validate crash reports against schemas at collection time, so let's nix that idea.

However, we do need a schema for the following things:

  1. documentation and descriptions of the fields, gotchas, etc
  2. tracking data reviews
  3. tracking related bugs
  4. permissions for the RawCrash API

Grabbing this to do soon.

Priority: P3 → P2
Summary: [tracker] schema for crash annotations in crash reports → schema for crash annotations in crash reports
Assignee: nobody → willkg

willkg merged PR #6259: "bug 1626698: raw crash schema" in 834acfd.

Next step is to test this out on stage and then figure out the order of follow-up work to do.

No longer depends on: 1453394

I pushed this just now to production in bug #1803661. Holy moly this is finally done.

Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.