Closed Bug 1803558 Opened 2 months ago Closed 28 days ago

crash reporting data dictionary

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files)

Telemetry has the Glean Dictionary which provides a data dictionary for glean fields.

Now that we have schemas for the raw and processed crash data (bug #1626698 and bug #1764395), we should expose the schema information to the Socorro docs so that we (finally) have field documentation.

These docs would include:

  • index page of all raw and processed crash fields
  • page for each individual field consisting of:
    • field name
    • field description
    • whether the field is indexed in supersearch and if so, how (keyword? boolean? etc)
    • data review urls
    • source annotation
    • data gotchas

For now, I think we should write a script that generates a whole bunch of pages in the sphinx docs. We check all that into the repo. Then the documentation will show up at socorro.readthedocs.org and be searchable.

It does mean we have to do an additional step after changing schemas. We can add a check for that in CI. That'll be good enough for now.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Summary: document raw and processed fields → crash reporting data dictionary

Everything so far was pushed to production just now in bug #1803661.

First pass looks good. Future work, I'm mulling over:

  1. For annotations, I want to add the list of products that have emitted the annotation in the last 7 days.
  2. For annotations for which there's a processed crash field with a source_annotation, we can add a link to the processed crash field.
  3. Some processed crash fields specify a source_annotation and are normalized, validated, and copied over to the processed crash using the CopyFromRawCrashRule processor rule. For the fields that are derived from annotations or are a little more complex so they need a special rule, there's no source annotation so the data dictionary can't document where the field data comes from. It'd be great if we could figure out something that doesn't require manual bookkeeping here.
  4. Some processed crash fields are objects and those are effectively not documented at all, yet. How do we want to document them? For example breadcrumbs, java_exception, memory_report, json_dump and friends, etc. Maybe we display a "friendly" version of the schema?
  5. The super search query type is opaque--it's not clear what it means. Should we add details about how you search with the field?
  6. There are some Socorro-y fields that probably aren't usable by other people. Should we show them in the data dictionary? For example, the collector_notes.

I think I'll split some/most of those and whatever other ideas come up from feedback next week.

Oh, and also:

  1. Redo the tables to use flex or grid or something that'll work with different viewport sizes.

willkg merged PR #6274: "bug 1803558: add link from annotation to processed field in docs" in b8a2002.

This implements item 2: for annotations where there's a processed crash field with a source annotation, there's now a link to the processed crash field.

willkg merged PR #6280: "bug 1803558: add caveats to application_build_id description" in 62273cc.

This adds some gotcha notes for ApplicationBuildID and application_build_id.

Update from comment #3, I've done the following:

  1. Done: For annotations, I want to add the list of products that have emitted the annotation in the last 7 days.
  2. Done: For annotations for which there's a processed crash field with a source_annotation, we can add a link to the processed crash field.
  3. Done: Some processed crash fields are objects and those are effectively not documented at all, yet. How do we want to document them? For example breadcrumbs, java_exception, memory_report, json_dump and friends, etc. Maybe we display a "friendly" version of the schema?

These aren't done:

  1. Some processed crash fields specify a source_annotation and are normalized, validated, and copied over to the processed crash using the CopyFromRawCrashRule processor rule. For the fields that are derived from annotations or are a little more complex so they need a special rule, there's no source annotation so the data dictionary can't document where the field data comes from. It'd be great if we could figure out something that doesn't require manual bookkeeping here.
    • This would be great, but I don't have a good idea of how to do this, so I'm going to not do it now.
  2. The super search query type is opaque--it's not clear what it means. Should we add details about how you search with the field?
    • This would be great, but I'm not sure how it should look and it overlaps with the super search docs.
  3. There are some Socorro-y fields that probably aren't usable by other people. Should we show them in the data dictionary? For example, the collector_notes.
    • I'm going to defer this for now.
  4. Redo the tables to use flex or grid or something that'll work with different viewport sizes.
    • We can tackle this when we redo the ui for the site.

Further, I want to add a new item:

  1. Fields need to be searchable. There's enough data now that if you're looking for a certain kind of information and don't know the field name, finding it is really hard. Maybe we should add search and filters and tags and such? This needs some thought and we shouldn't do anything unless the data dictionary is being used.

At this point, I think we have a good data dictionary now and I'm going to split off the remaining items we should do into new bugs and then close this bug out.

The unpushed bits were pushed to prod just now in bug #1809927. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 28 days ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.