Closed Bug 1803558 Opened 3 years ago Closed 3 years ago

crash reporting data dictionary

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files)

pr 6273: bug 1803558: data dictionary for crash reporting data 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review
pr 6274: bug 1803558: add link from annotation to processed field in docs 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review
pr 6275: bug 1803558: improve processed crash data and field doc template 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review
pr 6276: bug 1803558: show products for annotations 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review
pr 6280: bug 1803558: add caveats to application_build_id description 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review
pr 6306: bug 1803558: support nested schema items in data dictionary 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

3 years ago

Telemetry has the Glean Dictionary which provides a data dictionary for glean fields.

Now that we have schemas for the raw and processed crash data (bug #1626698 and bug #1764395), we should expose the schema information to the Socorro docs so that we (finally) have field documentation.

These docs would include:

index page of all raw and processed crash fields
page for each individual field consisting of:
- field name
- field description
- whether the field is indexed in supersearch and if so, how (keyword? boolean? etc)
- data review urls
- source annotation
- data gotchas

For now, I think we should write a script that generates a whole bunch of pages in the sphinx docs. We check all that into the repo. Then the documentation will show up at socorro.readthedocs.org and be searchable.

It does mean we have to do an additional step after changing schemas. We can add a check for that in CI. That'll be good enough for now.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

3 years ago

Depends on: 1626698, 1764395

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

3 years ago

Assignee: nobody → willkg

Status: NEW → ASSIGNED

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

3 years ago

Attached file pr 6273: bug 1803558: data dictionary for crash reporting data — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

3 years ago

willkg merged PR #6273: "bug 1803558: data dictionary for crash reporting data" in ef35fc6.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

3 years ago

Summary: document raw and processed fields → crash reporting data dictionary

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 3

•

3 years ago

•

Edited

Everything so far was pushed to production just now in bug #1803661.

First pass looks good. Future work, I'm mulling over:

For annotations, I want to add the list of products that have emitted the annotation in the last 7 days.
For annotations for which there's a processed crash field with a source_annotation, we can add a link to the processed crash field.
Some processed crash fields specify a source_annotation and are normalized, validated, and copied over to the processed crash using the CopyFromRawCrashRule processor rule. For the fields that are derived from annotations or are a little more complex so they need a special rule, there's no source annotation so the data dictionary can't document where the field data comes from. It'd be great if we could figure out something that doesn't require manual bookkeeping here.
Some processed crash fields are objects and those are effectively not documented at all, yet. How do we want to document them? For example breadcrumbs, java_exception, memory_report, json_dump and friends, etc. Maybe we display a "friendly" version of the schema?
The super search query type is opaque--it's not clear what it means. Should we add details about how you search with the field?
There are some Socorro-y fields that probably aren't usable by other people. Should we show them in the data dictionary? For example, the collector_notes.

I think I'll split some/most of those and whatever other ideas come up from feedback next week.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

3 years ago

Oh, and also:

Redo the tables to use flex or grid or something that'll work with different viewport sizes.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 5

•

3 years ago

Attached file pr 6274: bug 1803558: add link from annotation to processed field in docs — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 6

•

3 years ago

willkg merged PR #6274: "bug 1803558: add link from annotation to processed field in docs" in b8a2002.

This implements item 2: for annotations where there's a processed crash field with a source annotation, there's now a link to the processed crash field.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 7

•

3 years ago

Attached file pr 6275: bug 1803558: improve processed crash data and field doc template — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 8

•

3 years ago

willkg merged PR #6275: "bug 1803558: improve processed crash data and field doc template" in 7dced5e.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 9

•

3 years ago

Attached file pr 6276: bug 1803558: show products for annotations — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 10

•

3 years ago

willkg merged PR #6276: "bug 1803558: show products for annotations" in 305733f.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 11

•

3 years ago

Attached file pr 6280: bug 1803558: add caveats to application_build_id description — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 12

•

3 years ago

willkg merged PR #6280: "bug 1803558: add caveats to application_build_id description" in 62273cc.

This adds some gotcha notes for ApplicationBuildID and application_build_id.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 13

•

3 years ago

Attached file pr 6306: bug 1803558: support nested schema items in data dictionary — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 14

•

3 years ago

willkg merged PR #6306: "bug 1803558: support nested schema items in data dictionary" in 7563d62.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 15

•

3 years ago

•

Edited

Update from comment #3, I've done the following:

Done: For annotations, I want to add the list of products that have emitted the annotation in the last 7 days.
Done: For annotations for which there's a processed crash field with a source_annotation, we can add a link to the processed crash field.
Done: Some processed crash fields are objects and those are effectively not documented at all, yet. How do we want to document them? For example breadcrumbs, java_exception, memory_report, json_dump and friends, etc. Maybe we display a "friendly" version of the schema?

These aren't done:

Some processed crash fields specify a source_annotation and are normalized, validated, and copied over to the processed crash using the CopyFromRawCrashRule processor rule. For the fields that are derived from annotations or are a little more complex so they need a special rule, there's no source annotation so the data dictionary can't document where the field data comes from. It'd be great if we could figure out something that doesn't require manual bookkeeping here.
- This would be great, but I don't have a good idea of how to do this, so I'm going to not do it now.
The super search query type is opaque--it's not clear what it means. Should we add details about how you search with the field?
- This would be great, but I'm not sure how it should look and it overlaps with the super search docs.
There are some Socorro-y fields that probably aren't usable by other people. Should we show them in the data dictionary? For example, the collector_notes.
- I'm going to defer this for now.
Redo the tables to use flex or grid or something that'll work with different viewport sizes.
- We can tackle this when we redo the ui for the site.

Further, I want to add a new item:

Fields need to be searchable. There's enough data now that if you're looking for a certain kind of information and don't know the field name, finding it is really hard. Maybe we should add search and filters and tags and such? This needs some thought and we shouldn't do anything unless the data dictionary is being used.

At this point, I think we have a good data dictionary now and I'm going to split off the remaining items we should do into new bugs and then close this bug out.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

3 years ago

Blocks: 1808946

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

3 years ago

Blocks: 1808947

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 16

•

3 years ago

The unpushed bits were pushed to prod just now in bug #1809927. Marking as FIXED.

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.